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Abstract 

The Italian National Scientific Qualification (ASN) was introduced as a prerequisite for applying for tenured associate 
or full professor positions at state-recognized universities. The ASN is meant to attest that an individual has reached 
a suitable level of scientific maturity to apply for professorship positions. A five member panel, appointed for each 
scientific discipline, is in charge of evaluating applicants by means of quantitative indicators of impact and produc¬ 
tivity, and through an assessment of their research profile. Many concerns were raised on the appropriateness of the 
evaluation criteria, and in particular on the use of bibliometrics for the evaluation of individual researchers. Additional 
concerns were related to the perceived poor quality of the final evaluation reports. In this paper we assess the ASN in 
terms of appropriateness of the applied methodology, and the quality of the feedback provided to the applicants. We 
argue that the ASN is not fully compliant with the best practices for the use of bibliometric indicators for the evalua¬ 
tion of individual researchers; moreover, the quality of final reports varies considerably across the panels, suggesting 
that measures should be put in place to prevent sloppy practices in future ASN rounds. 
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1. Introduction 


The National Scientific Qualification (ASN) was introduced in 2010 as part of a global reform of the Italian 
university system. The new rules require that applicants for professorship positions in state-recognized universities 
must first acquire a national scientific qualification for the discipline and role applied to. 

The ASN is to be held once a year; at the time of writing, two rounds have been completed, started in 2012 
and 2013, respectively. Applicants are evaluated using quantitative indicators as well as expert assessment. The 
Italian Ministry of University and Research (MIUR) appoints 184 evaluation committees, one for each scientific 
discipline. Each committee is made of five members: four are selected among full professors from Italian universities, 
and one from foreign universities or research institutions. Each committee processes all applications for both the 
associate and full professor levels in its field of competence. 

Candidates are evaluated according to their scientific profile (research output and other scientific titles, see Sec¬ 
tion [2]). However, as an attempt to limit the unfair selection practices that have been associated with the Italian 
concorso ( |Gerosa| [2001) >, applicants are also evaluated according to three bibliometric indicators of impact and scien¬ 
tific productivity defined by the MIUR. The reliance of the ASN on bibliometric indicators was welcome by part of 
the academic community as a step towards more objective evaluation practices, but was also heavily criticized by oth¬ 
ers as a form of “career assessment by numbers” - a term first used in Kelly & lennions (2006)1 - and against the best 
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practices for the correct use of bibliometrics for the evaluation of individual researchers ( Banff & De Nicolao| 2013| l. 
Further complaints were raised as soon as the final results were made available. The fraction of qualified applicants 
varied considerably across Scientific Disciplines (SDs), from a minimum of 15.1% to a maximum of 81.1% (M ar- 
zolla, 2015} . Such large differences can not be explained in terms of uncompetitive applicants; rather, they suggest 
that the committees adopted different criteria for qualification, if not unfair evaluation practices (Abramo & D’Angelo, 
2015). In addition, many applicants perceived the individual evaluations they received as hastily written and poorly 
motivated. 

The issues above are not specific to the ASN: indeed, defining open, fair, and transparent evaluation procedures 
for career advancement of scientists is a challenging task, as witnessed by the plurality of hiring practices adopted 
in different countries ( Bennion & Locke} 2010| van den Brink et al. 2013| Dettmar| 2004| Vicker & Royer 2006). 
The ASN is an interesting case study, since it produced a large amount of data that have been made available on the 
Web for a short period of time. The data include, for each applicant: the list of publications and other scientific titles; 
the values of bibliometric indicators; the outcome of the application (qualified/not qualified), and a written assessment 
by the evaluation panel. 

In this paper we address the following two questions: (i) does the ASN comply with the best practices for the use 
of bibliometric indicators for evaluating individual researchers? (ii) do the final reports provide useful feedback to the 
applicants? Both questions refer to the quality of the ASN, intended as its level of transparency and fairness. 

The case study illustrated in this paper provides some important lessons about the risks and unintended side-effects 
of evaluation procedures for academics, especially when too much emphasis is put on quantity rather than quality. As 


bibliometrics is used more and more frequently to support hiring and promotion decisions (Sahel 20111, it is important 
to share the experience gathered from the field so that errors are not repeated. On top of that, national-wide research 
evaluation campaigns such as the ASN face additional challenges due to the large number of applications that must be 
processed. In these situations it is tempting for evaluation committees to “cut corners” and employ sloppy practices 
to speed up the evaluation process, that reflect negatively on those being evaluated. 

As valuable byproducts, we study the frequency of publication categories appearing in the application forms, 
and the structure of collaboration networks across scientific fields. The distribution of publication types can be used 
to understand how researchers in different disciplines disseminate their work. The investigation of the structure 
and dynamics of inter-disciplinary research collaboration is an important topic by itself that attracted considerable 
interest ( |Newman||2001||van Rijnsoever & H essels 2011; Wa gner et al.||20li~[|Abbasi et al.||2012} , and is important, 
e.g., for funding agencies to identify and possibly support joint research and development activities. 

Related work. Hiring and promotion procedures for academic staff vary considerably across countries. The Academic 
Career Observatory from the European University Institute published a comprehensive overview of the recruiting and 
career advancement procedures in European countries and abroacQ including information on salaries, access to non¬ 
nationals and gender issues. 

Qualification procedures somewhat similar to the ASN are already in place in other European countries, like 
Germany, France, and Spain. In Germany there are two paths towards professorship positions: Assistants working 
towards the Habilitation, and Junior Professors that must carry out a variety of tasks (including research, teaching, 
management) but are not required to get the Habilitation. The German Habilitation is essentially a second PhD, and 
may consist of either a thesis, or several publications of high quality (Enders, 2001} . Similarly, the French habilitation 
a diriger des recherches is awarded to applicants with a strong publication record over a period of years, and is 
required to supervise PhD students and to apply to professor positions (Musselin 2004} . Finally, Spain introduced the 
accreditation^ as a prerequisite to apply to Agregat and Catedratic positions (roughly equivalent to associate and full 
professor). The accreditation is granted by the Spanish national evaluation agency (ANECA) after detailed assessment 
of the applicant CV, including teaching, research experience, and list of publications. Of the three procedures above, 
the Spanish accreditation is the most similar to the ASN. However, the ASN is, to the best of our knowledge, the 
only scientific qualification that explicitly relies on bibliometric indicators of scientific productivity and impact to 
evaluate applicants. Also, while teaching activities play a significant role in the Spanish accreditation, they are barely 
considered by the ASN (see | Appendix B}. 


'http: //www.eui. eu/PrograimnesAndFellowships/AcademicCareersDbservatory/Index.aspx accessed on 2015-10-06. 
J http: //www. aneca. es/eng/Prograimnes/PEP accessed on 2015-10-03 
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A quantitative account of the ASN is given by Marzolla ( |2015| l: the author computes a set of descriptive statistics, 
showing among other things the fraction of qualified applicants, and the distribution of the values of bibliometric 
indicators. The study shows that the fraction of successful applicants varies considerably across SDs, suggesting that 
the qualification criteria were interpreted differently by each evaluation panel. This is confirmed by the comparison of 
bibliometric indicators of qualified and not qualified applicants, showing that some panels were more likely to deviate 
from purely quantitative considerations for granting or denying qualification. Abramo & D’Angelo ( 2015| > examine 
the relationship of the ASN outcome with the scientific merit of applicants, in order to identify possible cases of 
discrimination or favoritism. Discrimination refers to skilled (according to their bibliometric indicators) applicants 
that are denied qualification, while favoritism refers to under-performing applicants that are granted qualification. 
The results reveal that applicants that are not already employed by an academic institution (“outsiders”) tend to 
be more penalized. Finally, Pautasso: ( |2Q15| ) studies the proportions and success rates of female applicants across 
the various SDs to investigate gender issues. While in most disciplines the success rates of female applicants are 
comparable to that of male candidates, the study observes a significantly lower proportion of female scientists applying 
to most SDs, especially for the full professor role. 

Organization of this paper. This paper is organized as follows. In Section[2]we give some information on the ASN. In 
Section[3]we examine the evaluation forms: we study their length and average similarity as proxies of their perceived 
quality. In Section [4] we discuss whether the ASN methodology follows the current best practices for the correct use 
of bibliometric indicators for the evaluation of researchers. Finally, conclusions are presented in Section [5] Some 
interesting descriptive statistics on the ASN dataset that have been produced as a byproduct of the main analysis are 


described in Appendix B 


2. Background 

In this section we provide some background on the ASN and the Italian university system; for an historical 
perspective, see Degli Esposti & Geraci ( 2010) . 

In Italy, each professor and researcher is bound to a SD representing a specific field of study. There are 184 SDs 
organized in 14 areas shown in Table [T] Each SD is identified by a four-character code of the form AA/MC where 
AA is the numeric ID of the area (01-14), M is a single letter identifying the macro-sector, and C is a single digit 
identifying the discipline within the macro-sector. The full list can be found in |Appendix A| 

Before 2010, there were three tenured roles at Italian universities: assistant professor ( ricercatore universitario), 
associate professor (professore associato) and full professor ( professore ordinario). Hiring procedures were handled 
by universities advertising the position, according to centrally-defined rules mandated by state laws. Applicants had 
to undergo a written and/or oral examination ( concorso ) whose exact details differed for each role. 

Law 240/2010 replaced the role of tenured assistant professor with two fixed-term positions, called Type A and 
Type B researcher. Type B positions are supposed to be a path towards the associate professor role, since universities 
hiring Type B researchers must allocate funding for promotion in advance. Under the new rules, to apply for a 
permanent professor positions at any state-recognized university, one has to first obtain the ASN in the same SD 
and role (associate or full professor) applied for. A five-member evaluation panel, appointed by the MIUR for each 
discipline, grants or denies qualification after assessing the scientific profiles of applicants. The evaluation must take 
into account both the qualitative and quantitative scientific profile of candidates. The qualitative profile consists of 
the list of publications and other scientific titles, such as coordination of research projects, patents, visiting positions 
at foreign institutions, and so on (the teaching activity is not considered, though); each panel must also provide an 
opinion on a limited set of publications submitted by each applicant in full text. The quantitative profile is assessed 
using three numeric indicators of impact and productivity. 

Two sets of indicators are defined: bibliometric and non-bibliometric indicators. Bibliometric indicators apply 
to disciplines such as the hard sciences, biology and medicine, for which “sufficiently complete” citation databases 
exist. Specifically, bibliometric indicators apply to all disciplines of the nine areas Mathematics and Computer Sci¬ 
ences (MCS), Physics (PHY), Chemistry (CHE), Earth Sciences (EAS), Biology (BIO), Medical Sciences (MED), 
Agricultural Sciences and Veterinary Medicine (AVM), Civil Engineering and Architecture (CEA) and Industrial and 
Information Engineering (IIE), except 08/Cl -Design and technological planning of architecture. 08/DI -Architectural 
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design, 08/E 1 —Drawing, 08/E2 -Architectural restoration and history and 08/FI -Urban and landscape planning and 
design, but including the whole macro sector 1 \[E-Psychology. 

The bibliometric indicators are the following (the normalization procedure will be described shortly): 

B. 1 normalized number of journal papers; 

B.2 normalized number of citations received; 

B.3 normalized /z-index. 

Non-bibliometric indicators apply to all other disciplines (in general, social sciences and humanities), and are: 

N. 1 normalized number of authored books; 

N.2 normalized number of journal papers and book chapters; 

N.3 normalized number of papers published on “top” journals. 

The lists of “top” journals mentioned in N.3 have been defined by panels of experts from the relevant SDs, ap¬ 
pointed by the National Agency for the Assessment of Universities and Research (ANVUR), a public entity under 
control of MIUR. 

Normalization of the raw indicator's used to limit the bias against young applicants, and is based on the concept 
of scientific age: the scientific age SA(A) of applicant A that published the first paper in year to(A) is defined as: 


SA(A) := max {10, (2012 - t 0 (A) + 1)} 

Indicators B.l, N.l, N.2 and N.3 are normalized by multiplying their raw value by 10/&4(A). Indicator B.2 is 
normalized by dividing the raw number of citations by the scientific age. Finally, the value of B.3 is computed from 
the normalized number of citations per paper. Specifically, given a paper p, published in year t p , that at time t > t p 
has received C(p, t ) citations, the normalized number of citations S (p, t ) for p is defined as: 


S(p,t) := 


t p + 1 


COM) 


The normalized h-index h c is then the maximum integer such that h c papers of a given applicant received at least 
h c normalized citations each (Sidiropoulos et al. 2007) i. 

We remark that the terms bibliometric and non-bibliometric are used in the official MIUR documentation, although 
their meaning does not match the one used by the scientometric community. For this reason we will use the generic 
term “quantitative indicator” to refer to both bibliometric and non-bibliometric indicators. 

The values of quantitative indicators are compared to minimum thresholds, defined as the medians of the values 
of the same indicators for tenured professors of the same role and SD applied for. Both the medians and of the values 
of quantitative indicators for each applicant are computed by ANVUR using data from Scopus and Web of Science 
(WoS). The list of publications used to compute the medians, and the quantitative indicators of tenured professors, 
have not been made publicly available, so the computations can not be independently verified. 

Under the initial interpretation of the ASN rules, qualification could be granted only to applicants that strictly 
exceed at least two (one, for non-bibliometric disciplines) medians; this was understood to be a necessary but not 
sufficient condition for qualification. Later, MIUR relaxed this interpretation by allowing panels to grant qualification 
also to applicants that do not satisfy the constraint above, provided that such decision is motivate c£] Applicants who 
failed to get the qualification were prevented from applying again during the next two years. 


’ANVUR (2013), National Scientific Qualification - normalization of indicators by academic age (Abilitazione scientifica nazionale 
- la normalizzazione degli indicatori per 1’etd accademica ), http://www.anvur.org/attachments/article/253/normalizzazione_ 
indicatori_0.pdf accessed on 2015-10-06. 

■*F. Profumo, Newsletter of the ministry of education, university and research concerning some aspects of the new discipline for granting the na¬ 
tional scientific qualification introduced by law 240 on Dec. 30, 2010 (Newsletter of the Ministry of Education, University and Research concerning 
some aspects of the new discipline for acquiring the national scientific qualification introduced with Law 30 December 2010, n. 240 ( Nota Cir- 
colare del Ministem dell’Istruzione, dell’Universita e della Ricerca su alcuni aspetti della nuova disciplina per il conseguimento dell’abilitazione 
scientifica nazionale introdotta dalla legge 30 dicembre 2010, n. 240), January 11, 2013, http: //www. anvur. org/attachments/article/ 
252/Circolareaccessedon2015-10-06. 
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Id 

Code 

Area Name 

Applications 

Sample size 

Coverage 

1 

MCS 

Mathematics and Computer Sciences 

2492 

2116 

84.91% 

2 

PHY 

Physics 

4372 

4372 

100.00% 

3 

CHE 

Chemistry 

2344 

2344 

100.00% 

4 

EAS 

Earth Sciences 

1231 

1231 

100.00% 

5 

BIO 

Biology 

6244 

6244 

100.00% 

6 

MED 

Medical Sciences 

9987 

9266 

92.78% 

7 

AVM 

Agricultural Sciences and Veterinary Medicine 

2093 

1895 

90.54% 

8 

CEA 

Civil Engineering and Architecture 

3599 

3284 

91.25% 

9 

HE 

Industrial and Information Engineering 

4535 

3860 

85.12% 

10 

APL 

Antiquities, Philology, Literary Studies, Art History 

6324 

6322 

99.97% 

11 

HPP 

History, Philosophy, Pedagogy and Psychology 

5909 

3975 

67.27% 

12 

LAW 

Law 

3037 

2774 

91.34% 

13 

ECS 

Economics and Statistics 

4853 

4848 

99.90% 

14 

PSS 

Political and Social Sciences 

2129 

1274 

59.84% 




59149 

53805 

90.97% 


Table 1: The table reports, for each area, the total number of submitted qualification applications {Applications) and the number (Sample size ) and 
percentages ( Coverage ) of applications for which the CV and evaluation forms have been collected. 



Figure 1: Structure of an application form. 


All ASN applications were submitted electronically through the Web site http://abilitazione.miur.it; 
each application was then automatically converted to a PDF document, like the one shown in Figure [T] The form 
contains the following elements: 


1. Unique application ID; 

2. Applicant first, last name, and date of birth; the date of birth is a valuable detail because the triplet first name , 
last name, and date of birth is a robust unique identifier (Smalheiser & Torvik 20091; 

3. List of publications; 
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4. List of additional scientific qualifications and titles. 

The values of quantitative indicators, the application forms, and the final evaluations have been made publicly 
available for a short period of time at the ASN Web site. Table [T] shows the number of submitted applications for 
each area, and the number of application forms and final reports that have been collected and will be analyzed in this 
paper. Our dataset includes 53,805 pairs of forms (for each applicant, we either managed to get both the application 
and final report, or none of them). This corresponds to about 90% of all application forms, representing a sufficiently 
large subset. Unfortunately, the coverage is not uniform across the scientific areas: from Table [I] we observe that the 
dataset is complete for areas PHY, CHE, EAS and BIO. Areas History, Philosophy, Pedagogy and Psychology (HPP) 
and Political and Social Sciences (PSS) are only partially covered, and no reports at all are available for the following 
14 SDs: 


• 01/A4 -Mathematical physics 

• 06/D3 -Blood diseases, oncology and rheumatology 

• 06/El-Heart, thoracic and vascular surgery 

• 07 /Hi-Veterinary anatomy and physiology 

• 07/H5 -Clinical veterinary surgery and obstetrics 

• 08/Al-Hydraulics, hydrology, hydraulic and marine constructions 

• 09 /Ill-Information processing systems 

• ll/Al-Medieval history 

• 1 1/Al>—Contemporary history 

• 11/A4 -Science of books and documents, history of religions 

• 11/C2 -Logic, history and philosophy of science 

• 11/C 4—Aesthetics and philosophy of languages 

• 12/BI -Business, navigation and air law 

• 14/Cl -General and political sociology, sociology of law 


We remark that the coverage refers to the fraction of applications for which the PDF forms have been collected; 
the values of the quantitative indicators for all applicants have been collected, and where the subject of the analysis 
in ( [Marzolla] |2015) >. 

It is interesting to observe that each application form has a unique ID that appears to have been generated sequen¬ 
tially. There are gaps in the sequence of IDs; these gaps can be attributed to the fact that our sample is not complete, to 
applications that have been created but not finalized, and to applications that have been withdrawn after submission. 
The maximum ID in our dataset is 94765, much larger than the number of applications (59,149, see Marzolla (2015 i). 
The German tank problem (Rug gles & Brodie} 1947) technique can be used to get an accurate estimate of the total 
number of applications. A 95% confidence interval (Cl) is [94765.04,94771.5], which is compatible with the rough 
estimate using the maximum ID alone. 


Appendix B provides additional descriptive statistics of the ASN dataset. 
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Figure 2: Structure of a final report. 


3. Analysis of final reports 


In this section we focus our attention on the final reports containing the assessment of each applicant. A typical 
report is shown in Figure [2] and contains the following elements: 

1. Applicant’s last and first name; 

2. Collegial assessment (Giudizio collegiale) formulated by the whole panel; 

3-7. Individual assessment (Giudizi individuali) formulated by each member of the evaluation committee; the name 
of the committee member is indicated above the evaluation, that are therefore not anonymous; 

8. Result (qualified / not qualified). 


Most of the final reports are written in Italian, with the possible exception of the evaluations written by the foreign 
panel members. However, a few panels used a different language for the whole report. 

The reports are extremely important, especially for applicants who failed to get qualification: in these cases, it is 
reasonable to expect that the reports motivate the decision for denying qualification, and provide feedback to improve 
the quality of the applicant research output. A good report should list the strengths and weaknesses of each applicant, 
and provide an evaluation on each paper submitted in full text: does the paper address a topic that falls within the 
aim and scope of the SD? is the contribution significant? is the publication type appropriate? did the publication 
produce an impact on the scientific community? This is not dissimilar to the feedback that authors of a scientific 
paper submitted to peer-review expect to receive (Shashok 2008) 1. 

Unfortunately, as soon as the reports started to be made available, complaints were raised about their perceived 
poor quality. Among others, two issues were frequently reported: (i) very short reports that do not provide any 
useful feedback; (ii) reports that are very similar across applicants for the same SD, as if they were based on a 
template with only minor modifications. These issues are examples of anti-patterns (Koenig 1995) >, i.e., common but 
counterproductive solutions to some problem. 

The task of deciding whether a report is appropriate can not be fully automated, since this would require natural 
language processing capabilities far beyond the current state of the art; besides, the definition of “appropriate” is 
subjective and can not be encoded in any formal rule. However, the two anti-patterns above can be identified with the 
help of simple text metrics. In the following we focus on the length of the reports and their dissimilarity, measured 
through a suitable text distance function. 
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Median length (number of words) of final reports 
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Figure 3: Median length (number of words) of final reports for each discipline and role. 



Min. 

1st Qu. 

Median 

3rd Qu. 

Max. 

Associate 

153 

616 

936 

1342 

10030 

Full 

185 

658 

1050 

1481 

10970 


Table 2: Five number summary for the length (number of words) of final reports. 





















































































































































































































Collegial evaluation 

The scientific production of the applicant lies in the area of AAA BBB CCC, shows good coherence with the scientific 
discipline and good continuity, but is of limited quality. The applicant took part to national and international research 
projects. 

Individual evaluations 

PANEL MEMBER XXX 

Publications: the applicant presents publications related to AAA BBB CCC ; good fit with this discipline and temporal 
continuity; quality is poor and international visibility is very poor. Scientific titles: the applicant took part to national 
and international research projects. The applicant is not qualified. 

(Four other similar individual evaluations omitted) 


Figure 4: A fragment of an actual report (translation from the original in Italian) 



Min. 

1st Qu. 

Median 

3rd Qu. 

Max. 

RD Full prof, applications 

0 

0.003 

0.009 

0.021 

0.239 

RD Assoc, prof, applications 

0 

0.005 

0.012 

0.023 

0.237 


Table 3: Five number summary for the relative difference of the average length of final reports for qualified and not qualified applicants. 


3.1. Length of final reports 

The length of final reports is the number of characters or words they contain; we use the number of words as a 
matter of convenience, since this allows us to deal with smaller numbers that are more easy to grasp intuitively. 

Figure [3] shows the median length of the final reports for each discipline in our dataset; Table [2] shows the five 
number summary (Tukey 1911) of all lengths. The medians for full and associate professor applications are both 
about 1000 words, corresponding roughly to two pages like those shown in Figure [2] However, there are also a 
significant number of very short reports (200-300 words or less). They may be appropriate in some circumstances, 
e.g., if the applicant is obviously under-qualified, or has applied to an unrelated SD: in these cases there is no need 
to provide a lengthy explanation. Figure[3] however, shows that there are panels that systematically produced shorter 
reports, and this can not be explained by occasional low-quality candidates. 

As an actual example. Figure [4] shows the English translation of a portion of one of the short reports (about 300 
words) for an applicant who failed to get qualification; we only show the collegial evaluation and one of the individual 
assessments, the other four being very similar. As can be seen, the content is quite vague: the publications are 
considered of “limited quality”, and the international visibility “very poor”, without any further explanation. Such 
evaluation is far from useful, and does not provide any of the feedback mentioned at the beginning of this section. 

As a general rule, short reports should be closely scrutinized since they are likely to be of low quality, such as 
the one above. However, long reports should not be blindly considered better. For example, some panels listed 
the publications provided in full text by the applicant; in some cases the list appears multiple times in the same 
report, i.e., in the collegial assessment and in the five individual evaluations. The mere fact of listing the same 
publications over and over again increases the length but does not improve the quality of the evaluation, unless the 
lists are used to provide an assessment of each publication, as is actually done by some panels (e.g., the reports 
of 09/BI -Manufacturing technology and systems provide a detailed evaluation on each publication submitted for 
consideration). We will show later on how the length of final reports should be combined with their textual distance 
to obtain a less fragile quality indicator. 

To study whether there are significant differences between the average lengths of reports for qualified and not 
qualified applicants, we define the following quantities. Let LQj be the average length of reports for qualified appli¬ 
cants in discipline i, and LNQ, the average length of reports for not qualified applicants in i. The relative difference 
RDi of the lengths is defined as: 
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Figure 5: Correlation between the number of applications and the median length of the final reports 


RD . = m-LNQJ 
ma x(LQ h LNQi) 

Table [3] shows the five number summary of RD, for full and associate professor applications, respectively. The 
3rd quartile is about 0.02 for both roles; this means that the relative difference between reports for successful and 
unsuccessful applications is very small, less than 2% in 75% of the disciplines. 

We observe negative correlation between the median length of evaluations and the number of applications in 
each SD (Figure [5j. The rank order correlation coefficient is p = -0.29 with 95% Cl [-0.43, -0.14] for associate 
professor, and p = -0.35 with 95% Cl [-0.48, -0.20] for full professor applications. The negative correlation may 
be explained by the fact that the panels that had to process more applications could dedicate less time to each one. 
However, the correlation is weak, so we can not rule out the possibility that the lengths are unrelated to the number of 
applications. 


3.2. Similarity among evaluation forms 

Another problem that has been observed in some SDs is that the evaluations are almost identical, as if they 
were variations of the same template. To illustrate the problem, we report in Figure [6] the translation of two actual 
evaluations written by the same committee member for two applicants, A (who got the qualification) and B (who was 
denied qualification). The differences between the two texts consists of the three words shown in bold. From these 
tiny differences it is difficult to understand why applicant A was granted qualification but B was not: indeed, the terms 
“consistent”, “fair” and “good” bears a positive meaning, suggesting that B met all the criteria for qualification. The 
practice of “cloning” the evaluations to change just a few words is a sloppy practice that reduces the quality of final 
reports. In the following we assess the extent of this practice in all SDs. 

We measure the similarity among the reports of each SD by computing the text distance among documents. 
Two families of text distances are used in the literature: semantic distances, that measure whether two documents 
contains the same information, and string distances, that measure the similarity of their syntactic representation. 
String distances have the advantage of being easy to compute and content-agnostic; furthermore, they provide a 
stronger evidence that two documents share a common textual template, as in the example above. 

The Levenshtein distance ([Levenshtein 19651 measures the similarity of two documents as the minimum number 
of edit operations required to transform one document into the other (see | Appendix D] for details). We use the 
normalized Levenshtein distance that produces a value in the interval [0,1]. A distance of 0 denotes that the two 
documents are identical, while 1 denotes that the documents have no character in common. In practice, the normalized 
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Applicant A 

The publications presented by the applicant are considered 
sufficiently consistent with the scope of discipline XX/XX 
or the related interdisciplinary topics. The evaluation of the 
scientific contribution of the publications, in relation to the 
scope of scientific discipline XX/XX, is assessed using pa¬ 
rameter set 1 in Annex B of the minutes of the meeting 
held on X XXXX 2013 describing the criteria adopted by the 
panel, is good. The productivity of the applicant, assessed 
on the basis of publications submitted in relation to disci¬ 
pline XX/XX, with particular reference to the last five years 
prior to the call, using the parameter set 2 in Annex B of 
the minutes of the meeting held on XX XXXX 2013 describ¬ 
ing the criteria adopted by the panel, is overall good. Other 
qualifications submitted by the applicant to support his au¬ 
thority and scientific maturity in relation to scientific disci¬ 
pline XX/XX, are considered, based on the parameter set 3 
described in Annex B of the minutes of the meeting held on 
XX XXXX 2013 which describes the criteria adopted by the 
panel, excellent. 


Applicant B 

The publications presented by the applicant are considered 
consistent with the scope of discipline XX/XX or the re¬ 
lated interdisciplinary topics. The evaluation of the scien¬ 
tific contribution of the publications, in relation to the scope 
of scientific discipline XX/XX, is assessed using parameter 
set 1 in Annex B of the minutes of the meeting held on X 
XXXX 2013 describing the criteria adopted by the panel, is 
fair. The productivity of the applicant, assessed on the ba¬ 
sis of publications submitted in relation to discipline XX/XX, 
with particular reference to the last five years prior to the 
call, using the parameter set 2 in Annex B of the minutes of 
the meeting held on XX XXXX 2013 describing the criteria 
adopted by the panel, is overall good. Other qualifications 
submitted by the applicant to support his authority and sci¬ 
entific maturity in relation to scientific discipline XX/XX, are 
considered, based on the parameter set 3 described in An¬ 
nex B of the minutes of the meeting held on XX XXXX 2013 
which describes the criteria adopted by the panel, good. 


Figure 6: Two assessments written by the same member of one evaluation panel on two applicants (translation by the author). The differences are 
reported in bold 


Levenshtein distance rarely exceeds 0.8 even between unrelated documents written in different languages; higher 
values are therefore very unlikely to be observed. 

Given N reports \R\..... /Tv} for a given SD and role, we compute the pairwise distances L, t between document R, 
and Rj for all 1 < i < j < N. We strip all non-alphanumeric characters and translate uppercase letters to lowercase, to 
make the distance robust against changes in formatting marks. The empirical distribution of L, ; provides information 
about the mutual similarity of the documents in the set. Since the computation of all distances is time consuming, we 
consider a random sample of N = 100 reports for each SD and role. 

Figure [7] shows the medians of the normalized Levenshtein distances among the reports in the samples, for each 
discipline and role. Low values are a clear indication of low quality reports that are similar each other. On the 
other hand, high values can not be automatically considered an indication of better reports. As an example, let us 
consider SD 06/N1 -Applied medical technologies. According to Figure [7] its final reports have the higher distance 
within area MED; Figure[3j however, shows that the reports are, on average, the shortest in MED. Manual examination 
of the reports shows that they are indeed short and uninformative. The problem here is that two short documents that 


differ in a few words have higher distance than two long documents that differ in the exact same words (see Appendix 


[D]for a technical explanation). Therefore, short documents are more likely to have higher normalized distance than 
longer ones. 

The discussion above suggests that the length and normalized textual distance, if taken alone, are only weak 
indicators of the quality of the final reports since they can produce false positives: low values are clear indication 
of poorly written reports, but higher values do not automatically denote better ones. A more robust indicator can be 
obtained by jointly considering both metrics. A simple way to do so is to produce a scatter plot such as the one in 
Figure [8] where data points represent SDs whose coordinates are the median distance and the median report length, 
respectively; the dashed lines in the figure correspond to the global median length and distance. The plot for the 
associate professor level is almost identical and is not shown. 

The “good” reports are those that are both long and with high pairwise normalized distance, that are located in the 
upper right portion of the scatter plot. “Bad” reports, that are both short and undifferentiated, are located in the lower 
left portion. Hence, the scatter plot provides an easy way to identify the SDs that more likely produced low quality 
reports. 
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Figure 7: Median of the normalized Levenshtein distance among a random sample of 100 reports, for each SD and role; higher is better. 


4. Discussion 


In the previous section we have analyzed whether the ASN results provide useful feedback to the applicants. In 
this section we take a broader view by discussing the appropriateness of the ASN methodology, including the use of 
bibliometric indicators to evaluate individual applicants. Indeed, the ASN is the only national scientific qualification 
procedure that also uses quantitative indicators of productivity and impact for assessing applicants. 

The recently published Leiden manifesto for research metrics (Hicks et al. 2015[ > describes ten best practices that 
should be followed when using bibliometrics as a tool to evaluate individuals or organizations. The best practices are 
quite general and can be applied to any scientific discipline; it is therefore instructive to understand whether the ASN 
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Report length vs. Levenshtein distance 
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Figure 8: Median report length versus normalized Levenshtein distance between final reports for the full professor applications. The dashed lines 
denote the median length and distance. 


complies with them. Since the best practices are provided as high-level requirements rather than formal rules, the 
discussion will be somewhat subjective; to substantiate our claims we will refer to the quantitative analysis from the 
previous section, whenever appropriate. The best practices from the Leiden manifesto are the following: 


1. Quantitative evaluation should support qualitative, expert assessment. In the ASN, a five member panel 
is appointed for each SD, and must take into account both the quantitative and qualitative profile of applicants. 
Indeed, Marzolla ( 2015| observed that there is a considerable fraction of applicants that satisfies the quantitative 
requirements but is denied qualification; this fraction is not homogeneous across the SDs, suggesting that the 
qualitative assessment was carried out differently. Anyway, this denotes that the ASN is - at least in principle - 
not driven by the numbers only, and therefore this requirement appears to be met. 

2. Performance should be measured against the research missions of the institution, group or researcher. 
The ASN rules have been centrally defined and applied to all SDs, with the only distinction between bibliometric 
and non-bibliometric disciplines (see Section[2]). The quantitative indicators put in place for the two classes of 
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Criterion 


Pass Fail 


1. Quantitative evaluation should support qualitative, expert assessment •/ 

2. Measure performance against the research missions of the institution, group or researcher X 

3. Protect excellence in locally relevant research •/ 

4. Keep data collection and analytic processes open, transparent and simple X 

5. Allow those evaluated to verify data and analysis X 

6. Account for variation by field in publication and citation practices / 

7. Base assessment of individual researchers on a qualitative judgment of their portfolio / 

8. Avoid misplaced concreteness and false precision X 

9. Recognize the systemic effects of assessment and indicators X 

10. Scrutinize indicators regularly and update them •/ 


Table 4: Ten criteria proposed in the Leiden manifesto for research metrics. See text for discussion. 


disciplines are certainly not enough to cope with the variability of research practices and goals across fields 
of study. While each panel had the possibility to override at least part of the rules, very few of them did so. 
The Leiden manifesto remarks that “no single evaluation model applies to all contexts”; unfortunately this is 
precisely what happened with the ASN. 

3. Excellence in locally relevant research should be protected. Research excellence should not be identified 
with English-language publications only, since that would penalize the activities that have regional or national 
scope (typical of social sciences and humanities). The ASN relies on bibliometric data from Scopus and Web 
of Science for bibliometric disciplines, where English is used the most anyway. Social sciences and humanities 
use paper-counting metrics and lists of “top” journals for each specific field. These journals are published in a 
variety of languages, allowing locally relevant research to be recognized. 

4. The data collection and analytical processes should be kept open, transparent and simple. The ASN is 
based on a new and unproven methodology that has not been discussed with the scientific community, nor has 
been validated by experts in research evaluation. The official documents do not contain any reference to the 
state of the art and to the known best practices. Therefore we can conclude that the ASN does not provide a 
suitable level of openness and transparency. 

5. Allow those evaluated to verify data and analysis. The ASN fails (badly) to meet this requirement. In 
principle, applicants could verify the values of their quantitative indicators by computing them using data from 
Scopus and WoS. However, not everyone has access to these databases; furthermore, the values can be updated 
by the providers without notice, and therefore there is no guarantee that the values observed by the applicants 
at some time are the same values that are made available to the panels. The situation concerning the medians is 
worse: the list of publications used to compute them has not been made public, and it is therefore impossible 
to verify that the medians are correct. It should be observed that ANVUR released an updated set of threshold 
valueTjto fix errors that were discovered after publication of the initial set of thresholds. This raises the serious 
concern that other issues may have gone unnoticed. 

6. Account for variation by field in publication and citation practices. It is well known that citation-based met- 


in the Appendix lists the four most frequent publication types for each SD in our dataset, showing considerable 
differences also among disciplines within the same macro-sector. The ASN addressed these issues by defining 
different thresholds for each SD and role. Provisions were also made to cope with multimodal distributions of 
quantitative indicators caused by the coexistence of different scientific communities within the same field of 
study. 

7. Base assessment of individual researchers on a qualitative judgment of their portfolio. The ASN complies 
with this requirement. Indeed, applicants were required to submit a selection of their best publications to the 


rics vary significantly across fields of study Albarran et al. (2011). Publication practices also vary: Table C.ll 


5 Consiglio direttivo ANVUR, On the computation of medians for the national scientific qualification (Sul calcolo delle mediane per 
Vabilitazione nazionale ), Sep. 14, 2012, http://www.anvur.it/attachments/article/253/mediane_spiegate_defiiiitivo_14_ 
settembre_2012.pdf 
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evaluation panel. The quality of those publications had to be assessed as part of the applicant evaluation. Note, 
however, that the analysis of the final reports described in Section [3] questions the accuracy of the qualitative 
judgment of applicants on some SDs. 

8. Avoid misplaced concreteness and false precision. The thresholds of the quantitative indicators used in 
the ASN were supposed to be “hard” values that had to be strictly exceeded by applicants to be considered 
for qualification. This neglects the fact that the indicators are subject to uncertainties: should an applicant 
with contemporary /i-index equal to 10.4 be rejected if the minimum threshold is 10.5? While a few panels 
recognized the problem and adopted less stringent requirements, the vast majority stuck with the simplistic 
interpretation of the hard thresholds. 

Recognize the systemic effects of assessment and indicators. Scientists that are evaluated according to a set 
of rules inevitably tend to optimize their behavior to better fit the rules. The Leiden manifesto suggests that a 
pool of different metrics should be preferred to a single metric that can be easily gamed. The ASN complies 
with this suggestion, since it bases the evaluation on three quantitative indicators. However, we have observed 
that the values of the indicators are positively correlated in bibliometric disciplines (Marzolla 2015| >, suggesting 
that in fact they might measure the same thing. This suggests that the systemic effects of indicators were not 
properly dealt with. 

Scrutinize indicators regularly and update them. The MIUR made explicit provision to revise the criteria 


9. 


10 . 


and parameters every five yean0 


The discussion above is summarized in Table |4j where we show whether each requirement from the Leiden mani¬ 
festo is satisfied or not. Since the ASN was defined before the publication of the manifesto, it is unreasonable to expect 
that the ASN fully complies. However, the manifesto did not appear out of the blue: the issues associated with the use 
of bibliometrics to evaluate individuals are well known and have already been described in the literature (Institute de 
|Francel|LaIoe & Mosseri| [20091 |SaheIH20TT]|lEEEl[Okubo||1997D . 


5. Conclusions 


In this paper we have considered the Italian ASN as a case study in the evaluation of individual researchers for 
promotion. In particular, we were interested in assessing the appropriateness of the ASN in terms of fairness and 
quality of feedback provided to applicants. To do so, we addressed the following two questions: (i) does the ASN 
comply with the best practices for the use of bibliometric indicators for evaluating individual researchers? (if) do the 
final reports provide useful feedback to the applicants? 

The answer is partially positive for question (/). We have considered the ten best practices for evaluating individual 
researchers through bibliometrics, according to the Leiden manifesto for research metrics. The ASN fails to satisfy 
five out of then requirements: the metrics are defined without taking into consideration the mission of the institution, 
group or researcher; the data collection and analysis process is not transparent; applicants are unable to verify the data 
and analysis; the possible lack of precision of the quantitative indicators used is not taken into consideration; finally, 
the systemic effect of the assessment is overlooked. 

To answer question (if) we have used two simple measures (length and normalized Levenshtein distance) to ana¬ 
lyze the content of the individual reports containing a written assessment of each applicant. These measures, both in 
isolation and in combination, show that the perceived poor quality of some reports is indeed justified. 

Our analysis of the Italian ASN highlights several issues, listed below in no particular order: 


1. Understand and follow best practices. Rules and procedures for evaluating individual researchers should be 
defined with the help of experts in research evaluation, and should be discussed and accepted by the scientific 
communities. In the case of the ASN, Marzolla (2015 1 observed that the definition of the quantitative indicators 
and their medians generated several unintended side effects, including the “paradox of academic twins’^] (an 


6 Ministerial Decree 76/2012, Criteria and Parameters for evaluation of applicants for the National Scientific Qualification ( Regolamento recante 
criteri e parametriper la valutazione dei candidati aifini delVattribuzione dell’Abilitazione Scientifica Nazionale ), Ministerial Decree 76, June 7, 
2012,http://www.anvur.org/attachments/article/251/dm_07_06_12_regolamento_abilitazione.pdf ait. 9 
'http: //www.roars. it/online/sulla-revisione-dellasn-alcune-proposte/ accessed on 2015-10-06. 
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applicant with a proper subset of the publications of another one might have higher - i.e., better - quantitative 
indicators). Also, in some disciplines the thresholds for qualification at the associate level were higher than 
those for the full professor level, implying that in those disciplines there are higher requirements for the lower 
academic rank. Finally, the use of journal rankings presents known issues (Vanclay 2011} that have not been 
addressed in the list of top journals used in non-bibliometric disciplines. 

2. Allocate enough resources. Nation-wide research evaluation procedures should expect to receive a large num¬ 
ber of applications; it is therefore important that sufficient resources (time and manpower) are allocated so 
that all applications are evaluated fairly and accurately. Some evaluation panels of the ASN were subject to 
unrealistic deadlines, and therefore required multiple extensions that delayed publications of the results. This 
issue could be addressed by splitting the workload of the same SD across multiple panels and/or simplifying 
the qualification procedure in such a way that the workload becomes manageable. 

3. Check for common anti-patterns. An obvious corollary of the point above is that when evaluation panels are 
subject to unrealistic deadlines they inevitably tend to work sloppily in order to save time. A frequent complaint 
on the ASN refers to the poor quality of the final reports. The analysis in Section[3]shows that those complaints 
are in some cases justified. Suitable quality assurance mechanisms are put in place to improve the quality of 
final reports and provide consistent feedback to applicants; such mechanisms are already being used in some 
conferences to improve the quality of the paper review process (Canfora & Elbaum 2015) . 

4. Be transparent. Transparency is an important deterrent against unfair practices and corruption. In this context, 
transparency means that the output of the evaluation process should be public, so that ex-post analyses can be 
performed to identify issues. Moreover, if bibliometrics is used as part of the evaluation process, the indicators 
and their values should be verifiable by applicants. 
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Appendix A. List of Scientific Disciplines 

The list below enumerates all scientific areas (first indentation level), macro-sectors (second indentation level) and 
scientific disciplines. 


01 Mathematics and computer sciences 
01/A Mathematics 

01/A1 Mathematical logic, mathematics education and his¬ 
tory of mathematics 
01/A2 Geometry and algebra 

01/A3 Mathematical analysis, probability and statistics 
01/A4 Mathematical physics 
01/A5 Numerical analysis 
01/A6 Operational research 
01/B Computer Science 

01/BI Computer Science 

02 Physics 

02/A Physics of fundamental interactions 

02/A1 Experimental physics of fundamental interactions 
02/A2 Theoretical physics of fundamental interactions 
02/B Physics of matter 

02/BI Experimental physics of matter 
02/B2 Theoretical physics of matter 
02/B3 Applied physics 

02/C Astronomy, astrophysics, Earth and planetary physics 


02/Cl Astronomy, astrophysics, Earth and planetary physics 

03 Chemistry 

03/A Analytical and physical chemistry 
03/A1 Analytical chemistry 
03/A2 Models and methods for chemistry 
03/B Inorganic chemistry and applied technologies 

03/BI Principles of chemistry and inorganic systems 
03/B2 Chemical basis of technology applications 
03/C Organic, industrial and applied chemistry 
03/Cl Organic chemistry 
03/C2 Industrial and applied chemistry 
03/D Medicinal and food chemistry and applied technologies 

03/DI Medicinal, toxicological and nutritional chemistry and 
applied technologies 

03/D2 Drug technology, socioeconomics and regulations 
04 Earth sciences 

04/A Earth sciences 

04/A1 Geochemistry, mineralogy, petrology, volcanology, 
Earth resources and applications 
04/A2 Structural geology, stratigraphy, sedimentology and 
paleontology 
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04/A3 Applied geology, physical geography and geomorphol¬ 
ogy 

04/A4 Geophysics 

05 Biology 

05/A Plant biology 

05/A1 Botany 
05/A2 Plant physiology 
05/B Animal biology and anthropology 
05/BI Zoology and anthropology 
05/B2 Comparative anatomy and cytology 
05/C Ecology 

05/Cl Ecology 
05/D Physiology 

05/DI Physiology 

05/E Experimental and clinical biochemistry and molecular biology 
05/El General biochemistry and clinical biochemistry 
05/E2 Molecular biology 
05/F Experimental biology 

05/FI Experimental biology 
05/G Experimental and clinical pharmacology 

05/G1 Pharmacology, clinical pharmacology and pharmacog¬ 
nosy 

05/H Human anatomy and histology 
05/HI Human anatomy 
05/H2 Histology 
05/1 Genetics and microbiology 

05/11 Genetics and microbiology 

06 Medicine 

06/A Pathology and laboratory medicine 
06/A1 Medical genetics 

06/A2 Experimental medicine, pathophysiology and clinical 
pathology 

06/A3 Microbiology and clinical microbiology 
06/A4 Pathology 
06/B General clinical medicine 
06/BI Internal medicine 
06/C General clinical surgery 
06/Cl General surgery 
06/D Specialized clinical medicine 

06/DI Cardiovascular and respiratory diseases 
06/D2 Endocrinology, nephrology, food and wellness sci¬ 
ences 

06/D3 Blood diseases, oncology and rheumatology 
06/D4 Skin, contagious and gastrointestinal diseases 
06/D5 Psychiatry 
06/D6 Neurology 
06/E Specialized clinical surgery 

06/El Heart, thoracic and vascular surgery 
06/E2 Plastic and paediatric surgery and urology 
06/E3 Neurosurgery and maxillofacial surgery 
06/F Integrated clinical surgery 

06/FI Odontostomatologic diseases 
06/F2 Eye diseases 

06/F3 Otorhinolaryngology and audiology 
06/F4 Musculoskeletal diseases and physical and rehabilita¬ 
tion medicine 


06/G Paediatrics 

06/G1 Paediatrics and child neuropsychiatry 
06/H Gynaecology 

06/HI Obstetrics and gynecology 
06/1 Radiology 

06/11 Diagnostic imaging, radiotherapy and neuroradiology 
06/L Anaesthesiology 

06/Ll Anaesthesiology 
06/M Public health 

06/Ml Hygiene, public health, nursing and medical statistics 
06/M2 Forensic and occupational medicine 
06/N Applied medical technologies 

06/N1 Applied medical technologies 

07 Agricultural and veterinary sciences 

07/A Agricultural economics and appraisal 

07/A1 Agricultural economics and appraisal 
07/B Agricultural and forest systems 

07/BI Agronomy and field, vegetable, ornamental cropping 
systems 

07/B2 Arboriculture and forest systems 
07/C Agricultural, forest and biosyterns engineering 

07/Cl Agricultural, forest and biosystems engineering 
07/D Plant pathology and entomology 

07/DI Plant pathology and entomology 
07/E Agricultural chemistry and agricultural genetics 

07/El Agricultural chemistry, agricultural genetics and 
pedology 

07/F Food technology and agricultural microbiology 
07/FI Food science and technology 
07/F2 Agricultural microbiology 
07/G Animal science and technology 

07/G1 Animal science and technology 
07/H Veterinary medicine 

07/HI Veterinary anatomy and physiology 
07/H2 Veterinary pathology and inspection of foods of animal 
origin 

07/H3 Infectious and parasitic animal diseases 
07/H4 Clinical veterinary medicine and pharmacology 
07/H5 Clinical veterinary surgery and obstetrics 

08 Civil engineering and architecture 

08/A Landscape and infrastructural engineering 

08/A1 Hydraulics, hydrology, hydraulic and marine construc¬ 
tions 

08/A2 Sanitary and environmental engineering, hydrocarbons 
and underground fluids, safety and protection engineer¬ 
ing 

08/A3 Infrastructural and transportation engineering, real es¬ 
tate appraisal and investment valuation 
08/A4 Geomatics 

08/B Structural and geotechnical engineering 
08/BI Geotechnics 
08/B2 Structural mechanics 
08/B3 Structural engineering 
08/C Design and technological planning of architecture 

08/Cl Design and technological planning of architecture 
08/D Architectural design 

08/DI Architectural design 
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08/E Drawing, architectural restoration and history 
08/El Drawing 

08/E2 Architectural restoration and history 
08/F Urban and landscape planning and design 

08/FI Urban and landscape planning and design 

09 Industrial and information engineering 

09/A Mechanical and aerospace engineering and naval architecture 

09/A1 Aeronautical and aerospace engineering and naval ar¬ 
chitecture 

09/A2 Applied mechanics 

09/A3 Industrial design, machine construction and metallurgy 
09/B Manufacturing, industrial and managenent engineering 
09/BI Manufacturing technology and systems 
09/B2 Industrial mechanical plants 
09/B3 Business and management engineering 
09/C Energy, thermomechanical and nuclear engineering 

09/Cl Fluid machinery, energy systems and power generation 
09/C2 Technical physics and nuclear engineering 
09/D Chemical and materials engineering 

09/DI Materials science and technology 
09/D2 Systems, methods and technologies of chemical and 
process engineering 

09/D3 Chemical plants and technologies 
09/E Electrical and electronic engineering and measurements 
09/El Electrical technology 
09/E2 Electrical energy engineering 
09/E3 Electronics 
09/E4 Measurements 

09/F Telecommunications engineering and electromagnetic fields 
09/FI Electromagnetic fields 
09/F2 Telecommunications 
09/G Systems engineering and bioengineering 
09/G1 Systems and control engineering 
09/G2 Bioengineering 
09/H Computer engineering 

09/HI Information processing systems 

10 Antiquities, philology, literary studies, art history 

10/A Archaeological sciences 
10/A1 Archaeology 
10/B Art history 

10/BI Art history 

10/C Cinema, music, performing arts, television and media studies 

10/Cl Cinema, music, performing arts, television and media 
studies 

10/D Sciences of antiquity 

10/DI Ancient history 
10/D2 Greek language and literature 
10/D3 Latin language and literature 
10/D4 Classical and late antique philology 
10/E Medieval latin and romance philologies and literatures 

10/El Medieval latin and romance philologies and literatures 
10/F Italian studies and comparative literatures 

10/FI Italian literature, literary criticism and comparative lit¬ 
erature 

10/F2 Contemporary Italian literature 
10/F3 Italian linguistics and philology 


10/G Glottology and linguistics 

10/G1 Glottology and linguistics 
10/H French studies 

10/HI French language, literature and culture 
10/1 Spanish and Hispanic studies 

10/11 Spanish and Hispanic languages, literatures and cul¬ 
tures 

10/L English and Anglo-American studies 

10/Ll English and Anglo-American languages, literatures 
and cultures 

10/M Germanic and Slavic languages, literatures and cultures 
10/Ml Germanic languages, literatures and cultures 
10/M2 Slavic studies 
10/N Eastern cultures 

10/N1 Ancient Near Eastern, Middle Eastern and African cul¬ 
tures 

10/N3 Central and East Asian cultures 

11 History, philosophy, pedagogy and psychology 

11/A History 

11/A1 Medieval history 
11/A2 Modern history 
11/A3 Contemporary history 

11/A4 Science of books and documents, history of religions 
11/A5 Demography, ethnography and anthropology 
11 /B Geography 

11/BI Geography 
11/C Philosophy 

11/Cl Theoretical philosophy 
11/C2 Logic, history and philosophy of science 
11/C3 Moral philosophy 
11/C4 Aesthetics and philosophy of languages 
11/C5 History of philosophy 
11/D Educational theories 

11/DI Educational theories and history of educational theo¬ 
ries 

11/D2 Methodologies of teaching, special education and edu¬ 
cational research 

11/E Psychology 

11/El General psychology, psychobiology and psychomet¬ 
rics 

11/E2 Developmental and educational psychology 
11/E3 Social psychology and work and organizational psy¬ 
chology 

11/E4 Clinical and dynamic psychology 

12 Law studies 

12/A Private law 

12/A1 Private law 

12/B Business, navigation and air law and labour law 
12/BI Business, navigation and air law 
12/B2 Labour law 

12/C Constitutional and ecclesiastical law 
12/Cl Constitutional law 
12/C2 Ecclesiastical law and canon law 
12/D Administrative and tax law 
12/DI Administrative law 
12/D2 Tax law 

12/E International and European Union law, comparative, economics 
and markets law 
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12/El International and European Union law 
12/E2 Comparative law 

12/E3 Economics, financial and agri-food markets law and 
regulation 

12/F Civil procedural law 

12/FI Civil procedural law 
12/G Criminal law and criminal procedure 
12/G1 Criminal law 
12/G2 Criminal procedure 

12/H Roman law, history of medieval and modem law and philosophy 
of law 

12/HI Roman and ancient law 

12/H2 History of medieval and modem law 

12/H3 Philosophy of law 

13 Economics and statistics 

13/A Economics 

13/A1 Economics 
13/A2 Economic policy 
13/A3 Public economics 
13/A4 Applied economics 
13/A5 Econometrics 

13/B Business administration and Management 

13/BI Business administration and Management 
13/B2 Management 
13/B3 Organization studies 


13/B4 Financial Markets and Institutions 
13/B5 Commodity science 
13/C Economic history 

13/Cl Economic history 

13/D Statistics and mathematical methods for decisions 
13/DI Statistics 
13/D2 Economic statistics 
13/D3 Demography and social statistics 
13/D4 Mathematical methods of economics, finance and ac¬ 
tuarial sciences 

14 Political and social sciences 

14/A Political theory 

14/A1 Political philosophy 
14/A2 Political science 
14/B Political history 

14/BI History of political thought and institutions 
14/B2 History of international relations and of non-European 
societies and institutions 

14/C Sociology 

14/Cl General and political sociology, sociology of law 
14/C2 Sociology of culture and communication 
14/D Applied sociology 

14/DI Sociology of economy and labour, sociology of land 
and environment 


Appendix B. Descriptive Statistics 

In this section we report some descriptive statistics that can be derived from the application forms. The statistics 
provide useful contextual information on the demography and behavior of applicants, including: the age distribution, 
the frequency of publication types and scientific titles in each area, and the structure of the co-qualification graph. 


Age distribution of applicants. Figure |B.9| shows the age distribution of applicants for the full and associate role; 
individuals applying for multiple qualifications are counted once per role. The five number summary shows that 
applicants for the full professor role are, on average, slightly older than those applying for the associate level: the 
sample median is 49 years for full and 42 years for the associate role. 

Looking at the individual scientific areas (Figure [B. 10| ) we observe that the age of applicants spans a large range. 
Area Medical Sciences (MED) has the highest median age for both associate (46 years) and full professor applicants 
(53 years). The youngest successful applicant was 27 years old (in 2012), while the oldest was 69 years old. It is 
worth noticing that the retirement age for university professors in Italy is currently set to 70 years; yet, 12 qualified 
applicants for the associate and 85 for the full professor role are over 65 years old. These applicants are unlikely to be 
promoted before they retire. 

Are older applicants more (less) likely to get qualification than younger ones? To answer this question we use 
probit regression model (Bliss 1934|> to study the dependency of the ASN result (qualified/not qualified) on the 


applicant’s age. A probit model assumes that the qualification probability for a given age x can be expressed as: 

Pr( Qualified \ Age = x ) = <t>(/5 x x) (B.l) 

for a suitable scalar parameter (f where <&(•) is the cumulative distribution function of the normal distribution. Positive 
values of f J > denote that older applicants are more likely to qualify, while negative values denote negative correlation. 


Table B.5 shows 95% CIs for the value of ft for each area and role. Positive correlation is observed, among others, 
for both roles in areas MCS, CHE, Law (LAW) and Economics and Statistics (ECS). Negative correlation is observed 
in area Antiquities, Philology, Literary Studies, Art History (APL). Where the Cl for f3 includes zero, we can not 
reject the hypothesis that the qualification probability is unrelated to the age. 
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Age distribution of applicants 
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Min. 1st Qu. Median 3rd Qu. Max. 

Associate professor applicants 23 37 42 48 74 

Full professor applicants 25 44 49 54 74 


Figure B.9: Age distribution of applicants. 
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Figure B.10: Age distribution of applicants by area 


Distribution of publication types. The publications that can be listed in the applications forms are divided into seven 
categories: journal contribution, volume contribution, book, contribution in proceeding, patent, curatorship, and other 
publication type. Table |B.6| shows the list of the seven main categories and all sub-categories with their counts. The 
same publication may be counted multiple times, e.g., if it has multiple authors that are applying for qualification, or 
one of the authors applied for qualification on several disciplines or roles. We did not attempt to remove duplicates, 
since that would have had little impact on the rank of publication types at the cost of considerable technical complexity. 

The five most frequent types - journal article, paper in proceedings, book chapter, abstract in proceedings, and 
abstract in journal, respectively - represent more than 90% of all publications appearing in the dataset. The small but 
non-negligible fraction of “Other publication types” (1.39%) consists mostly of technical reports that have not been 
formally published. 
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Area 

95% Cl for /I 

Full professor Associate professor 


MCS 

Mathematics and Computer Sciences 

[-0.0270, -0.0002] 

- [-0.0461,-0.0225] 

- 

PHY 

Physics 

[-0.0169,0.0025] 

[0.0015,0.0157] 

+ 

CHE 

Chemistry 

[-0.0334, -0.0055] 

- [-0.0211,-0.0003] 

- 

EAS 

Earth Sciences 

[-0.0070,0.0360] 

[0.0098,0.0382] 

+ 

BIO 

Biology 

[-0.0037,0.0138] 

[0.0011,0.0122] 

+ 

MED 

Medical Sciences 

[-0.0106,0.0014] 

[-0.0142, -0.0062] 

- 

AVM 

Agricultural Sciences and Veterinary Medicine 

[-0.0196,0.0156] 

[-0.0030,0.0170] 


CEA 

Civil Engineering and Architecture 

[0.0002,0.0224] 

+ [-0.0095,0.0058] 


HE 

Industrial and Information Engineering 

[-0.0053,0.0162] 

[-0.0237, -0.0066] 

- 

APL 

Antiquities, Philology, Literary Studies, Art History 

[0.0028,0.0187] 

+ [0.0021,0.0119] 

+ 

HPP 

History, Philosophy, Pedagogy and Psychology 

[-0.0018,0.0177] 

[-0.0290, -0.0166] 

- 

LAW 

Law 

[-0.0444, -0.0192] 

- [-0.0562, -0.0367] 

- 

ECS 

Economics and Statistics 

[-0.0403, -0.0228] 

- [-0.0393, -0.0242] 

- 

PSS 

Political and Social Sciences 

[-0.0060,0.0281] 

[-0.0194,0.0034] 



Table B.5: Confidence intervals for/? (Eq. |B.l| . ’+' denotes positive correlation between age and qualification probability, i.e., older applicants are 
more likely to qualify; denotes negative correlation. 


Each SD has its own practices regarding the preferred venues for disseminating their research output; these dif¬ 


ferences are apparent if we look at Table C.l 1 1 in the Appendix, that lists the four most common publication types 
for each SD. Journal papers are common in areas Mathematics and Computer Sciences, Physics, Chemistry, Earth 
Sciences, Biology, Medical Sciences, and Agricultural Sciences and Veterinary Medicine, with the notable exception 
of 01/BI -Computer Science where the most common publication type is the conference proceeding. This peculiarity 
of 01/BI is in accordance with the DBLP computer science bibliography, that indexes 2.6 million publications by 1.4 
million authors; at the time of writing, 55.99% of the bibliographic entries in DBLP are conference proceedings, and 
39.94% are journal papenj^] 

A common trait of the areas above, apart from a few cases, is that the four most common publication types account 
for more than 90% of the total number of publications. In the remaining areas (Civil Engineering and Architecture, 
Industrial and Information Engineering, Antiquities, Philology, Literary Studies, Art History, History, Philosophy, 
Pedagogy and Psychology, Law, Economics and Statistics, and Political and Social Sciences), the most frequent 
publication type is again the journal article, with a significant number of disciplines where conference proceedings 
or book chapters are the preferred media. Interestingly, the social sciences and humanities adopt more diversified 
dissemination practices: the four most frequent publication types account for about 70%-80% of the publications. 

While there are yet no comprehensive studies on the frequency of publication types on different scientific areas, 
some data have been analyzed for Norway and Australia. Sivertsen (2009 1 analyzes the frequency of articles in 


journals (with ISSN), articles in books (with ISBN), and books for all scientific fields in Norway higher education 
sector; articles in books here include also papers in conference proceedings. The data shows that publication patterns 
are quite different across SDs and also within subfields of the same discipline, in particular within the social sciences 


and humanities. This is in accordance with our findings (see Appendix C I. Also, publication types in the computer 
science community in Norway show the same skewness towards conference papers that we observe. 

The report of the Excellence in Research for Australia (ERA) evaluation ( ERA Report] ! contains statistics on 
the publications submitted as part of the national evaluation of Australian universities and research institutes. Caution 
should be adopted in comparing ERA and ASN, since they have very different goals - ERA aims at evaluating research 
institutes, while the ASN evaluates individuals. 

ERA classifies research outputs in three main categories: 


Traditional outputs: Books, book chapters, conference publications and journal articles; 


%ttp: //dblp. uni-trier. de/statistics/distributionofpublicationtype accessed on 2015-10-06. 
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Publication Type 

Count 

% 

Rank 

Journal contribution 

2276633 

62.68 


Journal paper 

2115083 

58.23 

1 

Abstract in journal 

100142 

2.76 

5 

Review in journal 

49099 

1.35 

8 

Comment of verdict 

9457 

0.26 

14 

Translation in journal 

2402 

0.07 

21 

Bibliography 

450 

0.01 

33 

Volume contribution 

417025 

11.48 


Book chapter 

356326 

9.81 

3 

Dictionary or encyclopedia entry 

28635 

0.79 

10 

Catalogue entry 

15476 

0.43 

12 

Preface/postface 

7881 

0.22 

15 

Translation in volume 

4382 

0.12 

17 

Introduction 

3529 

0.10 

18 

Review in volume 

796 

0.02 

28 

Book 

93475 

2.57 


Monograph or scientific treatise 

80800 

2.22 

6 

Book translation 

4935 

0.14 

16 

Bibliographic entry 

3209 

0.09 

19 

Critical edition of books/archaeological excavation 

2676 

0.07 

20 

Scientific commentary 

791 

0.02 

29 

Publication of new literary or archivistic document 

647 

0.02 

31 

Index 

260 

0.01 

36 

Concordance 

157 

0.00 

37 

Contribution in proceedings 

728415 

20.05 


Paper in proceedings 

538856 

14.84 

2 

Abstract in proceedings 

164951 

4.54 

4 

Poster 

24608 

0.68 

11 

Patents 

14446 

0.4 


Patent 

14446 

0.40 

13 

Curatorship 

40196 

1.11 


Curatorship 

40196 

1.11 

9 

Other 

62064 

1.71 


Other publication types 

50554 

1.39 

7 

Composition 

2043 

0.06 

22 

Database 

1732 

0.05 

23 

Exhibition 

1604 

0.04 

24 

Software 

1497 

0.04 

25 

Exposition 

1324 

0.04 

26 

Chart 

1133 

0.03 

27 

Drawing 

660 

0.02 

30 

Design 

591 

0.02 

32 

Performance 

401 

0.01 

34 

Artifact 

373 

0.01 

35 

Art prototype 

152 

0.00 

38 

Total 

3632254 

100.00 



Table B.6: Counts of publication types. Percentages refer to the fraction of each type with respect to the total number of publications submitted by 
all applicants; Rank is the rank of each type according to the frequency of occurrence in the CVs. 


Non-traditional outputs: Curated or exhibited event, live performance, original creative work, recorded/rendered 
work, portfolio of non-traditional research outputs; 


Output types within portfolios: Curated exhibited events, live performance, original creative work, recorded ren- 
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Scientific Title 


Associate 



Full 


Count 

% Appl. 

Rank 

Count 

% Appl. 

Rank 

Other titles 

28459 

76.60 

1 

12936 

77.69 

1 

Participation to research projects 

27754 

74.70 

2 

1 

0.01 

10 

Research or teaching fellowships abroad 

18192 

48.96 

3 

9246 

55.53 

3 

Scientific awards 

16566 

44.59 

4 

8135 

48.86 

5 

Membership of editorial board of journals 

13954 

37.56 

5 

8837 

53.07 

4 

Involvement with foreign research institutes 

11521 

31.01 

6 

1 

0.01 

10 

Technology transfer activities (e.g., startups) 

5548 

14.93 

7 

3642 

21.87 

7 

Direction of research institutes 

1 

0.00 

9 

1466 

8.80 

9 

Membership of scientific academies 

1 

0.00 

9 

3661 

21.99 

6 

Coordination of research projects 

1 

0.00 

9 

11275 

67.71 

2 

Editor in chief of journals, encyclopedias, or treatises 

0 

0.00 

11 

2796 

16.79 

8 

Number of applications 

37154 



16651 




Table B.7: Application counts with at least one instance of a given scientific title. Percentages refer to the fraction of applications with at least one 
instance of the given title, therefore the percentages do not sum to 100. 


dered work. 


More than 413,000 research outputs were submitted to the ERA: 69% were journal articles, 18% conference 
papers, 10% book chapters, 1% books, and the remaining 2% non-traditional outputs. These percentages are remark¬ 
ably similar to the percentages of journal contributions, contributions in proceedings, volume contributions and books 
shown in Table |B.6~| Looking at individual disciplines, 62% of research outputs within the ERA research area “Infor¬ 
mation and computing sciences” are conference papers, 30% are journal articles, 7% book chapters, and less than 1% 
books. These are similar to those observed in our dataset for 01 /B1 -Computer Science. 


Table C.ll shows that abstracts are unusually common in many ASN disciplines, in particular those of areas 5 
(BIO) and 6 (MED). For example, abstracts represent more than 20% of all publications listed in the curricula of 
applicants for 06/E2 -Plastic and paediatric surgery and urology. Since the rank of publication types remains the 
same even if we consider successful applicants only, abstracts are not used by low quality applicants only, but instead 
play an important role in the dissemination of research results in some scientific communities. 

The role of abstracts that emerges from our dataset is more prominent than what can be desumed from other 
sources. For example, while abstracts represent 15% of the publications of successful qualifications in area MED, 
they constitute only 4% of the references listed by PubMed, a bibliographic database of biomedical research papertj^] 
The origin of this difference should be investigated in future studies. 

Distribution of scientific titles. The last part of the application forms contain the list of additional scientific quali¬ 
fications (also called scientific titles ) of the candidate. The list of allowed scientific titles, that is the same for both 


associate and full professor applicants, is reported in Table B.7 Candidates were required to supply additional details 
in some cases; for example, an applicant claiming “Participation to research projects” had to specify the project name, 
duration and role assumed (e.g., participant, task coordinator, affiliate member). 

The most frequently mentioned title, appearing in 76.6% of the applications for the associate and 77.69% for 
the full professor role, is the catch-all category “Other titles”. Manual examination reveals that candidates used this 
category to list teaching duties, service activities (conference organization, coordination of Master or PhD programs, 
program committee memberships), invited presentations and consulting activities. All these items seems relevant, and 
the fact that they appear frequently suggests that they should be given specific entries on their own. 

Teaching experience is a conspicuous omission from the list of qualifications; research and teaching fellowships at 
foreign universities can be indicated, but teaching activities at Italian institutions can not. While the ASN is intended 
to attest only the scientific qualification of applicants, professors at Italian universities are required to teach (there are 
no research-only positions in Italy). 


s https://www.nlm.nih.gov/bsd/licensee/2014_stats/2014_less_QLDMEDLINE_L0.html accessed on 2015-10-06. 
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n 

Number of applicants that 
submitted n applications 

% 

Number of qualified applicants that 
acquired n qualifications 

% 

i 

27374 

73.37 

17123 

86.50 

2 

6726 

18.03 

2071 

10.46 

3 

1670 

4.48 

397 

2.01 

4 

853 

2.29 

136 

0.69 

5 

259 

0.69 

35 

0.18 

>5 

430 

1.15 

34 

0.17 

Total 

37312 

100.00 

19796 

100.00 


Table B.8: Number of individuals that submitted n applications; number of applicants that received n qualifications. 


Min. 

1st Qu. 

Median 

3rd Qu. 

Max. 

0.001 

0.004 

0.006 

0.014 

0.338 


Table B.9: Five number summary of the nonzero entries of the co-qualification matrix. 


Table B.7 shows a couple of differences between associate and full professor applications. “Coordination of 
research projects”, “Editor in chief of journals, encyclopedias, or treatises”, “Membership of scientific academies” 
and “Direction of research institutes” are claimed by applicants for the full professor role only, with a single ex¬ 
ception. This is understandable, since these roles, in particular direction of research institutes, are usually held by 
well-established scientists that are likely approaching the top of the academic rank. Note that department heads and 
team leaders of Italian national research centers (CNR, INFN, ENEA...), are not necessarily university professors, 
and some of them applied to the ASN claiming (correctly) direction of research institutes. Interestingly, of the 14,67 
applications claiming direction of research institutes, only 762 were successful. 

On the other hand, “Participation to research projects” and “Involvement with research institutes” are claimed by 
candidates for associate professor qualification only, again with a single exception. We see no obvious reason why 
applicants for the higher role should not pursue these activities; perhaps they are just considered not worth being 
mentioned. 


Co-qualification analysis. The ASN allowed individuals to apply for qualification in multiple SDs and roles. Ta¬ 
ble B.8 shows how many candidates submitted n different applications, and how many received n qualifications. Our 
dataset contains 53,805 applications from 37,312 individuals. Most of the applicants (73.37%) submitted a single 
application, but a significant fraction (18.03%) submitted two. The maximum number of applications submitted by 
one individual is 34 (none of them was successful). Overall, 19,796 applicants were granted at least one qualification; 
86.50% of them acquired exactly one qualification, and 10.46% got two. The most successful applicant qualified for 
both roles in 8 SDs, collecting a total of 16 qualifications. 

The existence of individuals that qualified in two different SD, say i and j, is an indication that some overlap 
may exist between the scope of i and j, fostered by the personal interest of researchers working on cross-disciplinary 
boundaries. In this section we study co-qualifications in more detail, as a proxy for the level of affinity among SDs. 

For each pair of disciplines i, j, i 4 j, we define the co-qualification strength A7, ; as the fraction of applicants that 
qualified in either i or j that qualified in both: 


N. of applicants that qualified in both SD i and j 
IJ N. of applicants that qualified in either SD i or j 

By definition, 0 < Mij *''■ 1 and 47 j j — 47 jj. If 47 jj — 0, then there is no applicant that leceived qualification in both 
i and j; this suggests that disciplines i and j might be unrelated. 47 ,y = 1 means that every applicant that qualified 
for SD i also qualified for j. It turns out that co-qualifications across disciplines are relatively rare: only 531 out of 
170 x 169/2 = 14,365 pairs have nonzero co-qualification strength; the five number summary of the nonzero values 
of the co-qualification matrix are shown in Table |B.9~] 
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Figure B.ll: Co-qualification graph (best viewed in color). Colors denote the 14 scientific areas. Node sizes are proportional to the number of 
incident edges; edge widths are proportional to co-qualification strengths. See text for details. 


An effective way to visualize co-qualifications is to draw the co-qualification graph G (Figure B.ll| i. G is a 
weighted, undirected graph where each node represents a SD, and two nodes i, j are connected by an edge of weight 
Mjj if and only there exists at least one applicant that qualified in both i and j. 

The co-qualification graph has 170 nodes and 531 edges. We use colors to distinguish the 14 scientific ar¬ 
eas. The node sizes are proportional to the number of incident edges, and edge widths is proportional to the co¬ 
qualification strength: thick edges denote a higher fraction of co-qualified applicants (i.e., higher values of M,/). We 
used Gephi (Bastian et al. |2009[ > and igraph ( |Csardi & Ne pusz. 2006 ) to draw G and compute the metrics described 
in the following. 

To study the relationships among SDs we look for two important structural patterns in the co-qualification graph: 
hubs and cliques. A hub is a node with a large number of neighbors, such as node E in Figure |BTT2 (a). A hub in 
G can be interpreted as a “general” discipline with partial overlaps with more specific ones that are not necessarily 
related each other. A clique is a complete subgraph, i.e., a subset of nodes that are pairwise connected by an edge; 
as an example, nodes {A, B, C, D, E) in Figure |B. 1 2|(b) form a clique. Cliques in the co-qualification graph represent 
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Figure B.12: Important sub-structures of the co-qualification graph: (a) hub; (b) clique 


Scientific Discipline 

N. of neighbors 

05/FI -Experimental biology 

28 

05/El -General biochemistry and clinical biochemistry 

28 

05/E2 -Molecular biology 

26 

06/N1 —Applied medical technologies 

23 

06/A2- Experimental medicine, pathophysiology and clinical pathology 

21 

05/Cl -Ecology 

19 

05 /D1— Physiology 

18 

02/B3 -Applied physics 

16 

06/D6 -Neurology 

16 

02/Bl-Experimental physics of matter 

15 


Table B. 10: The ten disciplines with highest degree in the co-qualification graph 


disciplines having mutual overlap, identifying a broader area of related research activities. 

Hubs can be identified by looking at the node degree distribution of G. The degree d(v) of a node v is the number 
of incident edges (an edge is incident to a node if it has one of the endpoints on that node). The hubs in G are the 
disciplines with higher degree. 

Five of them (05/FI -Experimental biology, 05/El -General bio- 


The ten biggest hubs are shown in Table B.10 


chemistry and clinical biochemistry, 05 [E2-Molecular biology, 05/Cl -Ecology, and 05/DI -Physiology) belong to 
area BIO; three ( 06fNl-Applied medical technologies, 06/ A2-Experimental medicine, pathophysiology and clinical 
pathology, and 06/D6-Neurology) belong to area MED, and the remaining two (02/B3 -Applied physics and 02/B1- 
Experimental physics of matter) belong to area PHY. 

The co-qualification graph contains several cliques, i.e., complete subgraphs. A maximal cliques G' is a subgraph 
G' c G such that no node can be added to G' to form a bigger clique. The largest clique in G has size 9, and consists 
of the following disciplines (all belonging to areas BIO and MED): 


• 05/HI -Human anatomy, 05/FI -Experimental biology, 05/H2 -Histology, 05 [B2-Comparative anatomy and cy¬ 
tology, 06/N1 -Applied medical technologies, 06/A1 -Medical genetics, 06/A2- Experimental medicine, patho¬ 
physiology and clinical pathology, 05[E2-Molecular biology, and 05/El-General biochemistry and clinical 
biochemistry. 


The ties between disciplines in area MED and BIO are confirmed by the existence of three maximal cliques of 
size 8 that include the following disciplines: 


• 05fll-Genetics and microbiology, 05[El-Experimental biology, 05[B2-Comparative anatomy and cytology, 
05/H2-Histology, 06/A1 -Medical genetics, 05[E2-Molecular biology, 05/El -General biochemistry and clini¬ 
cal biochemistry, and 0(r/tx2—Experimental medicine, pathophysiology and clinical pathology. 


• 05/HI -Human anatomy, 05/FI -Experimental biology, 05/H2-Histology, 05/B2 -Comparative anatomy and cy¬ 
tology, 06/N1 -Applied medical technologies, 05fDl-Physiology, 05 [E2-Molecular biology, and 05/El -General 
biochemistry and clinical biochemistry. 
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• 05/HI -Human anatomy, 05/Fl-Experimental biology, 05/U2-Histology, 05/B2 -Comparative anatomy and cy¬ 
tology, 06fNl-Applied medical technologies, 06/A1 -Medical genetics, 06/ 'A2—Experimental medicine, patho¬ 
physiology and clinical pathology, and 06/D6-Neurology. 

Other smaller cliques exist: 5 maximal cliques of size 7, 17 maximal cliques of size 6, and 133 maximal cliques 
of size between 3 and 5 inclusive. 


Appendix C. Most frequent publication types for each scientific discipline 

The following table lists the four most frequent publication types for each SD. We use the following keys: 
ABSJ = Abstract in journal; ABSP = Abstract in proceedings; AF = Artifact; ART = Art prototype; BIB = Bibliogra¬ 
phy; BIBE = Bibliographic entry; CAT = Catalogue entry; CH = Chart; CHAP = Book chapter; COM = Composi¬ 
tion; COMM = Scientific commentary; CONC = Concordance; CRIT = Critical edition of books/archaeological 
excavation; CUR = Curatorship; DB = Database; DES = Design; DICT = Dictionary or encyclopedia entry; 
DRAW = Drawing; EXH = Exhibition; EXP = Exposition; IDX = Index; INTRO = Introduction; JRNL = Journal 
paper; MONO = Monograph or scientific treatise; OP = Other publication types; PAT = Patent; PERF = Per¬ 
formance; POS = Poster; PREF = Preface/postface; PROC = Paper in proceedings; REVJ = Review in jour¬ 
nal; REVV = Review in volume; SRC = Publication of new literary or archivistic document; SW = Software; 
TRB = Book translation; TRJ = Translation in journal; TRV = Translation in volume; VERD = Comment of ver¬ 
dict; 


Table C.l 1: Four most frequent publication types for each scientific discipline. 


SD 

Most common publication types 



Other 


1 st 2nd 

3rd 

4th 



01/A1 

JRNL 

1487 (42.26%) 

PROC 

848 (24.10%) 

CHAP 

542 (15.40%) 

DICT 

138 (3.92%) 

504(14.32%) 

01/A2 

JRNL 

3369 (84.65%) 

PROC 

264 (6.63%) 

OP 

144 (3.62%) 

CHAP 

109 (2.74%) 

94 (2.36%) 

01/A3 

JRNL 

6564 (84.38%) 

PROC 

567 (7.29%) 

CHAP 

262 (3.37%) 

OP 

237 (3.05%) 

149(1.91%) 

01/A5 

JRNL 

1415 (65.30%) 

PROC 

432(19.94%) 

CHAP 

172 (7.94%) 

OP 

50 (2.31%) 

98 (4.51%) 

01/A6 

JRNL 

1031 (55.46%) 

PROC 

497 (26.73%) 

CHAP 

152 (8.18%) 

ABSP 

89 (4.79%) 

90 (4.84%) 

01/BI 

PROC 

13318 (57.16%) 

JRNL 

6353 (27.26%) 

CHAP 

2116(9.08%) 

CUR 

526 (2.26%) 

988 (4.24%) 

02/A1 

JRNL 

157547 (94.91%) 

PROC 

6774 (4.08%) 

OP 

620 (0.37%) 

CHAP 

390 (0.23%) 

672 (0.41%) 

02/A2 

JRNL 

23983 (80.09%) 

PROC 

4740 (15.83%) 

CHAP 

517(1.73%) 

OP 

164(0.55%) 

542(1.80%) 

02/BI 

JRNL 

41390 (81.24%) 

PROC 

6745 (13.24%) 

CHAP 

1269 (2.49%) 

ABSP 

698 (1.37%) 

848 (1.66%) 

02/B2 

JRNL 

20305 (86.23%) 

PROC 

1754 (7.45%) 

CHAP 

791 (3.36%) 

ABSP 

267 (1.13%) 

430(1.83%) 

02/B3 

JRNL 

18162 (71.45%) 

PROC 

4370(17.19%) 

ABSP 

884 (3.48%) 

CHAP 

859 (3.38%) 

1143 (4.50%) 

02/Cl 

JRNL 

26605 (79.91%) 

PROC 

4937 (14.83%) 

ABSP 

617(1.85%) 

CHAP 

447 (1.34%) 

689 (2.07%) 

03/A1 

JRNL 

7597 (68.61%) 

PROC 

1617 (14.60%) 

ABSP 

890 (8.04%) 

CHAP 

765 (6.91%) 

204(1.84%) 

03/A2 

JRNL 

16780 (84.15%) 

PROC 

1328 (6.66%) 

CHAP 

732 (3.67%) 

ABSP 

637 (3.19%) 

463 (2.33%) 

03/BI 

JRNL 

24713 (84.11%) 

PROC 

1594 (5.42%) 

ABSP 

1415 (4.82%) 

CHAP 

913(3.11%) 

748 (2.54%) 

03/B2 

JRNL 

13866 (68.40%) 

PROC 

3659(18.05%) 

ABSP 

1174 (5.79%) 

CHAP 

730 (3.60%) 

844(4.16%) 

03/Cl 

JRNL 

9291 (74.31%) 

PROC 

1712(13.69%) 

CHAP 

552(4.41%) 

ABSP 

434 (3.47%) 

514(4.12%) 

03/C2 

JRNL 

4547 (61.45%) 

PROC 

1585 (21.42%) 

ABSP 

532 (7.19%) 

CHAP 

352 (4.76%) 

383 (5.18%) 

03/DI 

JRNL 

10941 (75.61%) 

PROC 

1324 (9.15%) 

ABSP 

948 (6.55%) 

PAT 

437 (3.02%) 

820 (5.67%) 

03/D2 

JRNL 

3502 (52.92%) 

PROC 

1648 (24.90%) 

ABSP 

895 (13.52%) 

CHAP 

222 (3.35%) 

351 (5.31%) 

04/A1 

JRNL 

6719 (60.94%) 

ABSP 

1509 (13.69%) 

PROC 

1350 (12.24%) 

CHAP 

538 (4.88%) 

910(8.25%) 

04/A2 

JRNL 

6606 (60.01%) 

PROC 

1360(12.35%) 

ABSP 

977 (8.88%) 

CHAP 

829 (7.53%) 

1236(11.23%) 

04/A3 

JRNL 

3620 (41.34%) 

PROC 

2193 (25.04%) 

CHAP 

1134(12.95%) 

ABSP 

780 (8.91%) 

1030(11.76%) 

04/A4 

JRNL 

3939 (65.85%) 

PROC 

946(15.81%) 

ABSP 

481 (8.04%) 

CHAP 

357 (5.97%) 

259 (4.33%) 

05/A1 

JRNL 

8641 (59.94%) 

PROC 

2612(18.12%) 

ABSP 

1201 (8.33%) 

CHAP 

1129 (7.83%) 

832 (5.78%) 

05/A2 

JRNL 

1893 (72.47%) 

PROC 

250 (9.57%) 

CHAP 

199 (7.62%) 

ABSP 

154(5.90%) 

116(4.44%) 

05/BI 

JRNL 

8267 (63.70%) 

ABSP 

1772 (13.65%) 

PROC 

1188 (9.15%) 

CHAP 

1046 (8.06%) 

706 (5.44%) 

05/B2 

JRNL 

4347 (74.51%) 

PROC 

628 (10.76%) 

ABSP 

315(5.40%) 

CHAP 

248 (4.25%) 

296 (5.08%) 

05/Cl 

JRNL 

9785 (67.39%) 

PROC 

1573 (10.83%) 

ABSP 

1534(10.56%) 

CHAP 

914 (6.29%) 

714(4.93%) 

05/DI 

JRNL 

8545 (73.32%) 

PROC 

1230 (10.55%) 

ABSJ 

629 (5.40%) 

CHAP 

474 (4.07%) 

777 (6.66%) 

05/El 

JRNL 

31942 (77.93%) 

PROC 

2872 (7.01%) 

ABSP 

1657 (4.04%) 

CHAP 

1476 (3.60%) 

3043 (7.42%) 

05/E2 

JRNL 

16448 (81.51%) 

PROC 

1176 (5.83%) 

ABSP 

735 (3.64%) 

ABSJ 

604 (2.99%) 

1215 (6.03%) 

05/FI 

JRNL 

26147 (76.72%) 

ABSJ 

2036 (5.97%) 

PROC 

1991 (5.84%) 

ABSP 

1669 (4.90%) 

2239 (6.57%) 

05/G1 

JRNL 

11862 (78.12%) 

PROC 

960 (6.32%) 

ABSP 

856 (5.64%) 

ABSJ 

668 (4.40%) 

839 (5.52%) 

05/HI 

JRNL 

6237 (72.72%) 

PROC 

1144(13.34%) 

ABSP 

428 (4.99%) 

ABSJ 

394 (4.59%) 

374 (4.36%) 

05/H2 

JRNL 

2966 (80.80%) 

CHAP 

217(5.91%) 

PROC 

193 (5.26%) 

ABSP 

85 (2.32%) 

210(5.71%) 

05/11 

JRNL 

3761 (80.95%) 

ABSP 

326 (7.02%) 

CHAP 

250 (5.38%) 

PROC 

155 (3.34%) 

154 (3.31%) 

06/A1 

JRNL 

9269 (73.76%) 

ABSJ 

1451 (11.55%) 

ABSP 

784 (6.24%) 

POS 

342 (2.72%) 

720 (5.73%) 

06/A2 

JRNL 

22580 (83.24%) 

PROC 

1221 (4.50%) 

CHAP 

992 (3.66%) 

ABSJ 

606 (2.23%) 

1727 (6.37%) 

06/A3 

JRNL 

4681 (68.23%) 

PROC 

771 (11.24%) 

ABSP 

625 (9.11%) 

ABSJ 

231 (3.37%) 

553 (8.05%) 

06/A4 

JRNL 

10291 (84.56%) 

ABSJ 

635 (5.22%) 

ABSP 

435 (3.57%) 

PROC 

379 (3.11%) 

430 (3.54%) 

06/BI 

JRNL 

24506 (74.67%) 

ABSJ 

2827 (8.61%) 

PROC 

2327 (7.09%) 

CHAP 

1367 (4.17%) 

1792 (5.46%) 

06/Cl 

JRNL 

30621 (56.47%) 

ABSP 

6951 (12.82%) 

PROC 

6589(12.15%) 

ABSJ 

5830(10.75%) 

4236 (7.81%) 

06/DI 

JRNL 

18763 (76.95%) 

ABSJ 

2044 (8.38%) 

PROC 

27 

1568 (6.43%) 

CHAP 

1060 (4.35%) 

949 (3.89%) 




Table C.l 1: Four most frequent publication types for each scientific discipline. 


SD 

Mo.st common publication types 







Other 

1st 


2nd 


3rd 


4th 


06/D2 

JRNL 

16398 (76.87%) 

ABSP 

1938 (9.08%) 

PROC 

920 (4.31%) 

CHAP 

912 (4.28%) 

1165 (5.46%) 

06/D4 

JRNL 

35973 (76.16%) 

ABSJ 

3582 (7.58%) 

ABSP 

2967 (6.28%) 

CHAP 

2375 (5.03%) 

2337 (4.95%) 

06/D5 

JRNL 

7876 (71.24%) 

CHAP 

953 (8.62%) 

PROC 

913(8.26%) 

ABSJ 

573 (5.18%) 

740 (6.70%) 

06/D6 

JRNL 

22221 (77.58%) 

ABSJ 

2935 (10.25%) 

CHAP 

1313(4.58%) 

ABSP 

962 (3.36%) 

1210 (4.23%) 

06/E2 

JRNL 

7343 (51.25%) 

ABSP 

2886 (20.14%) 

ABSJ 

1797 (12.54%) 

PROC 

1293 (9.02%) 

1009 (7.05%) 

06/E3 

JRNL 

6269 (64.46%) 

ABSP 

1179(12.12%) 

PROC 

815(8.38%) 

CHAP 

650 (6.68%) 

812(8.36%) 

06/FI 

JRNL 

11694 (62.97%) 

PROC 

2910 (15.67%) 

ABSP 

1284 (6.91%) 

ABSJ 

1047 (5.64%) 

1635 (8.81%) 

06/F2 

JRNL 

2086 (59.28%) 

PROC 

491 (13.95%) 

ABSP 

462(13.13%) 

ABSJ 

202 (5.74%) 

278 (7.90%) 

06/F3 

JRNL 

7486 (59.48%) 

ABSP 

1832 (14.56%) 

PROC 

1500(11.92%) 

CHAP 

1066 (8.47%) 

702 (5.57%) 

06/F4 

JRNL 

6870 (55.19%) 

PROC 

1988 (15.97%) 

ABSP 

1183 (9.50%) 

ABSJ 

979 (7.87%) 

1427 (11.47%) 

06/G1 

JRNL 

21611 (74.44%) 

ABSJ 

2601 (8.96%) 

PROC 

1556 (5.36%) 

ABSP 

1397 (4.81%) 

1865 (6.43%) 

06/HI 

JRNL 

13354 (68.32%) 

ABSP 

1867 (9.55%) 

PROC 

1408 (7.20%) 

ABSJ 

1386 (7.09%) 

1530 (7.84%) 

06/11 

JRNL 

16560 (58.21%) 

ABSP 

3532(12.41%) 

ABSJ 

3429 (12.05%) 

PROC 

2130 (7.49%) 

2800 (9.84%) 

06/Ll 

JRNL 

3154 (60.42%) 

ABSP 

549(10.52%) 

ABSJ 

459 (8.79%) 

CHAP 

378 (7.24%) 

680(13.03%) 

06/Ml 

JRNL 

17965 (75.45%) 

PROC 

2124 (8.92%) 

ABSP 

1344 (5.64%) 

CHAP 

934 (3.92%) 

1442 (6.07%) 

06/M2 

JRNL 

6911 (59.64%) 

PROC 

2234 (19.28%) 

ABSP 

1085 (9.36%) 

CHAP 

629 (5.43%) 

728 (6.29%) 

06/N1 

JRNL 

17614 (77.38%) 

PROC 

1558 (6.84%) 

ABSJ 

1275 (5.60%) 

ABSP 

1024 (4.50%) 

1291 (5.68%) 

07/A1 

JRNL 

2252 (39.77%) 

CHAP 

1623 (28.66%) 

PROC 

1105 (19.52%) 

MONO 

279 (4.93%) 

403 (7.12%) 

07/BI 

JRNL 

4043 (54.81%) 

PROC 

2270 (30.78%) 

CHAP 

514(6.97%) 

ABSP 

251 (3.40%) 

298 (4.04%) 

07/B2 

JRNL 

4843 (57.74%) 

PROC 

2022 (24.11%) 

CHAP 

632 (7.53%) 

ABSP 

503 (6.00%) 

388 (4.62%) 

07/Cl 

JRNL 

1725 (41.92%) 

PROC 

1570 (38.15%) 

CHAP 

377 (9.16%) 

ABSP 

220 (5.35%) 

223 (5.42%) 

07/DI 

JRNL 

6609 (52.01%) 

PROC 

2403 (18.91%) 

ABSP 

1801 (14.17%) 

ABSJ 

720 (5.67%) 

1174 (9.24%) 

07/El 

JRNL 

5242 (51.55%) 

PROC 

2202 (21.65%) 

ABSP 

1347 (13.25%) 

CHAP 

704 (6.92%) 

674 (6.63%) 

07/FI 

JRNL 

3917 (56.49%) 

PROC 

1740 (25.09%) 

ABSP 

555 (8.00%) 

CHAP 

376 (5.42%) 

346 (5.00%) 

07/F2 

JRNL 

2633 (62.57%) 

PROC 

610(14.50%) 

ABSP 

519(12.33%) 

CHAP 

303 (7.20%) 

143 (3.40%) 

07/G1 

JRNL 

5106 (51.04%) 

PROC 

3077 (30.76%) 

ABSP 

781 (7.81%) 

ABSJ 

371 (3.71%) 

668 (6.68%) 

07/H2 

JRNL 

3434 (55.09%) 

PROC 

1777 (28.50%) 

ABSP 

481 (7.72%) 

ABSJ 

244 (3.91%) 

298 (4.78%) 

07/H3 

JRNL 

3567 (62.73%) 

PROC 

1374 (24.16%) 

ABSP 

390 (6.86%) 

ABSJ 

126 (2.22%) 

229 (4.03%) 

07/H4 

JRNL 

1604 (50.79%) 

PROC 

842 (26.66%) 

ABSP 

347 (10.99%) 

ABSJ 

201 (6.36%) 

164 (5.20%) 

08/A2 

PROC 

2787 (48.95%) 

JRNL 

1915(33.63%) 

CHAP 

651 (11.43%) 

ABSP 

117(2.05%) 

224 (3.94%) 

08/A3 

PROC 

1721 (40.20%) 

JRNL 

1146 (26.77%) 

CHAP 

968 (22.61%) 

MONO 

151 (3.53%) 

295 (6.89%) 

08/A4 

PROC 

1606 (52.67%) 

JRNL 

1069 (35.06%) 

CHAP 

260 (8.53%) 

MONO 

32(1.05%) 

82 (2.69%) 

08/BI 

PROC 

1989 (58.07%) 

JRNL 

889 (25.96%) 

CHAP 

260 (7.59%) 

OP 

114(3.33%) 

173 (5.05%) 

08/B2 

PROC 

2272 (48.76%) 

JRNL 

1763 (37.83%) 

CHAP 

278 (5.97%) 

ABSP 

177 (3.80%) 

170 (3.64%) 

08/B3 

PROC 

4945 (61.94%) 

JRNL 

1986 (24.88%) 

CHAP 

687 (8.61%) 

OP 

190 (2.38%) 

175 (2.19%) 

08/Cl 

JRNL 

4268 (31.82%) 

CHAP 

3500 (26.09%) 

PROC 

3018 (22.50%) 

MONO 

793 (5.91%) 

1834(13.68%) 

08/DI 

JRNL 

3975 (34.79%) 

CHAP 

3535 (30.94%) 

MONO 

674 (5.90%) 

CUR 

585 (5.12%) 

2658 (23.25%) 

08/El 

CHAP 

1770 (34.05%) 

PROC 

1377 (26.49%) 

JRNL 

766 (14.74%) 

MONO 

369 (7.10%) 

916(17.62%) 

08/E2 

CHAP 

3575 (38.04%) 

JRNL 

1919 (20.42%) 

PROC 

1277 (13.59%) 

CAT 

630 (6.70%) 

1996 (21.25%) 

08/FI 

CHAP 

5447 (34.94%) 

JRNL 

4751 (30.48%) 

PROC 

2082 (13.36%) 

MONO 

866 (5.56%) 

2443 (15.66%) 

09/A1 

PROC 

5292 (56.96%) 

JRNL 

3001 (32.30%) 

CHAP 

344 (3.70%) 

OP 

333 (3.58%) 

321 (3.46%) 

09/A2 

PROC 

2227 (57.18%) 

JRNL 

1267 (32.53%) 

CHAP 

193 (4.96%) 

OP 

76(1.95%) 

132 (3.38%) 

09/A3 

PROC 

6278 (53.91%) 

JRNL 

4326 (37.15%) 

CHAP 

497 (4.27%) 

ABSP 

147 (1.26%) 

397 (3.41%) 

09/BI 

PROC 

2065 (47.19%) 

JRNL 

1840 (42.05%) 

CHAP 

298 (6.81%) 

PAT 

63 (1.44%) 

110(2.51%) 

09/B2 

PROC 

1912(53.75%) 

JRNL 

1218(34.24%) 

CHAP 

162 (4.55%) 

MONO 

125 (3.51%) 

140 (3.95%) 

09/B3 

PROC 

1771 (45.79%) 

JRNL 

1254 (32.42%) 

CHAP 

578 (14.94%) 

OP 

91 (2.35%) 

174 (4.50%) 

09/Cl 

PROC 

3857 (61.53%) 

JRNL 

1833 (29.24%) 

CHAP 

281 (4.48%) 

OP 

114(1.82%) 

183 (2.93%) 

09/C2 

PROC 

5338 (50.77%) 

JRNL 

4054 (38.56%) 

CHAP 

395 (3.76%) 

MONO 

256 (2.43%) 

471 (4.48%) 

09/DI 

JRNL 

6988 (55.65%) 

PROC 

3906 (31.11%) 

ABSP 

573 (4.56%) 

CHAP 

559 (4.45%) 

531 (4.23%) 

09/D2 

JRNL 

3476 (48.07%) 

PROC 

2741 (37.91%) 

CHAP 

421 (5.82%) 

ABSP 

339 (4.69%) 

254 (3.51%) 

09/D3 

JRNL 

3464 (49.51%) 

PROC 

2611 (37.32%) 

CHAP 

468 (6.69%) 

ABSP 

234 (3.34%) 

220 (3.14%) 

09/El 

PROC 

2855 (44.56%) 

JRNL 

2739 (42.75%) 

ABSP 

384 (5.99%) 

CHAP 

263 (4.10%) 

166 (2.60%) 

09/E2 

PROC 

5841 (68.73%) 

JRNL 

2244 (26.40%) 

CHAP 

159(1.87%) 

MONO 

83 (0.98%) 

172 (2.02%) 

09/E3 

PROC 

6793 (49.16%) 

JRNL 

5939 (42.98%) 

CHAP 

397 (2.87%) 

PAT 

304 (2.20%) 

384 (2.79%) 

09/E4 

PROC 

3816(61.77%) 

JRNL 

1920 (31.08%) 

CHAP 

146 (2.36%) 

ABSP 

111 (1.80%) 

185 (2.99%) 

09/FI 

PROC 

4165 (51.45%) 

JRNL 

3359 (41.49%) 

ABSP 

216 (2.67%) 

CHAP 

182 (2.25%) 

174 (2.14%) 

09/F2 

PROC 

7341 (61.78%) 

JRNL 

3677 (30.95%) 

CHAP 

368 (3.10%) 

PAT 

284 (2.39%) 

212(1.78%) 

09/G1 

PROC 

5778 (57.62%) 

JRNL 

3256 (32.47%) 

CHAP 

579 (5.77%) 

MONO 

120(1.20%) 

294 (2.94%) 

09/G2 

JRNL 

6124 (52.37%) 

PROC 

3394 (29.03%) 

ABSP 

723 (6.18%) 

CHAP 

613 (5.24%) 

839 (7.18%) 

10/A 1 

CHAP 

9208 (31.89%) 

JRNL 

7511 (26.01%) 

PROC 

5722(19.81%) 

CAT 

1414 (4.90%) 

5023 (17.39%) 

10/BI 

CHAP 

5711 (33.88%) 

CAT 

3368 (19.98%) 

JRNL 

2617 (15.52%) 

DICT 

1313(7.79%) 

3850 (22.83%) 

10/Cl 

CHAP 

3895 (30.24%) 

JRNL 

3382 (26.26%) 

PROC 

1043 (8.10%) 

CUR 

935 (7.26%) 

3624 (28.14%) 

10/DI 

JRNL 

757 (28.74%) 

CHAP 

687 (26.08%) 

PROC 

365 (13.86%) 

REVJ 

310(11.77%) 

515(19.55%) 

10/D2 

JRNL 

2976 (34.75%) 

CHAP 

1585 (18.51%) 

REVJ 

1203 (14.05%) 

DICT 

730 (8.53%) 

2069 (24.16%) 

10/D3 

JRNL 

1283 (33.52%) 

CHAP 

816(21.32%) 

REVJ 

747 (19.52%) 

MONO 

239 (6.25%) 

742(19.39%) 

10/D4 

JRNL 

2657 (36.27%) 

CHAP 

1736 (23.70%) 

REVJ 

870(11.88%) 

PROC 

540 (7.37%) 

1523 (20.78%) 

10/El 

JRNL 

1550 (27.39%) 

CHAP 

1254 (22.16%) 

REVJ 

750(13.26%) 

DICT 

428 (7.56%) 

1676 (29.63%) 

10/FI 

JRNL 

2830 (27.17%) 

CHAP 

2642 (25.37%) 

REVJ 

1250 (12.00%) 

PROC 

877 (8.42%) 

2816 (27.04%) 

10/F2 

JRNL 

2502 (30.41%) 

CHAP 

1908 (23.19%) 

REVJ 

854(10.38%) 

MONO 

689 (8.37%) 

2274 (27.65%) 

10/F3 

JRNL 

2344 (27.26%) 

CHAP 

2124 (24.70%) 

DICT 

862 (10.02%) 

REVJ 

829 (9.64%) 

2440 (28.38%) 

10/G1 

CHAP 

2166 (30.23%) 

JRNL 

1957 (27.31%) 

PROC 

819(11.43%) 

MONO 

520 (7.26%) 

1704 (23.77%) 

10/HI 

CHAP 

1436 (23.96%) 

JRNL 

1434 (23.92%) 

REVJ 

781 (13.03%) 

PROC 

600(10.01%) 

1743 (29.08%) 
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Table C.l 1: Four most frequent publication types for each scientific discipline. 


SD 

Most common publication types 







Other 

1st 


2nd 


3rd 


4th 


10/11 

JRNL 

1129 (24.37%) 

CHAP 

1128 (24.35%) 

PROC 

574 (12.39%) 

REVJ 

392 (8.46%) 

1409 (30.43%) 

10/Ll 

CHAP 

3065 (30.17%) 

JRNL 

3014 (29.67%) 

REVJ 

757 (7.45%) 

MONO 

701 (6.90%) 

2622 (25.81%) 

10/Ml 

CHAP 

1213 (30.94%) 

JRNL 

847 (21.61%) 

REVJ 

383 (9.77%) 

PROC 

300 (7.65%) 

1177 (30.03%) 

10/M2 

CHAP 

633 (29.28%) 

JRNL 

526 (24.33%) 

PROC 

234 (10.82%) 

REVJ 

233 (10.78%) 

536 (24.79%) 

10/N1 

JRNL 

1742 (27.19%) 

CHAP 

1548 (24.16%) 

REVJ 

625 (9.76%) 

PROC 

605 (9.44%) 

1886 (29.45%) 

10/N3 

CHAP 

1200 (24.46%) 

JRNL 

1047 (21.34%) 

DICT 

490 (9.99%) 

REVJ 

475 (9.68%) 

1694 (34.53%) 

11/A2 

CHAP 

1800 (30.29%) 

JRNL 

1553 (26.14%) 

DICT 

591 (9.95%) 

REVJ 

531 (8.94%) 

1467 (24.68%) 

11/A5 

CHAP 

1264 (35.43%) 

JRNL 

988 (27.69%) 

MONO 

353 (9.89%) 

CUR 

286 (8.02%) 

677 (18.97%) 

11/BI 

CHAP 

4145 (35.32%) 

JRNL 

2822 (24.05%) 

PROC 

1558 (13.28%) 

REVJ 

724 (6.17%) 

2487 (21.18%) 

11/Cl 

JRNL 

2145 (30.62%) 

CHAP 

1596 (22.78%) 

REVJ 

635 (9.06%) 

MONO 

624 (8.91%) 

2005 (28.63%) 

11/C3 

CHAP 

1324 (25.72%) 

JRNL 

1318(25.60%) 

REVJ 

603(11.71%) 

MONO 

442 (8.59%) 

1461 (28.38%) 

11/C5 

JRNL 

2882 (24.92%) 

CHAP 

2687 (23.23%) 

REVJ 

1420 (12.28%) 

DICT 

1338(11.57%) 

3238 (28.00%) 

11/D 1 

CHAP 

992 (32.64%) 

JRNL 

907 (29.85%) 

MONO 

311 (10.23%) 

REVJ 

177 (5.82%) 

652 (21.46%) 

11/D2 

CHAP 

1109 (34.91%) 

JRNL 

994 (31.29%) 

MONO 

337 (10.61%) 

PROC 

258 (8.12%) 

479(15.07%) 

11/El 

JRNL 

8406 (62.19%) 

PROC 

1859(13.75%) 

CHAP 

1216 (9.00%) 

ABSP 

824(6.10%) 

1211 (8.96%) 

11/E2 

JRNL 

1704 (44.71%) 

CHAP 

777 (20.39%) 

PROC 

729(19.13%) 

ABSP 

162 (4.25%) 

439(11.52%) 

11/E3 

JRNL 

2035 (50.27%) 

CHAP 

960 (23.72%) 

PROC 

493(12.18%) 

ABSP 

262 (6.47%) 

298 (7.36%) 

11/E4 

JRNL 

2473 (50.88%) 

CHAP 

944 (19.42%) 

PROC 

539(11.09%) 

ABSP 

281 (5.78%) 

623 (12.83%) 

12/A1 

CHAP 

1787 (38.18%) 

JRNL 

1494 (31.92%) 

VERD 

641 (13.69%) 

MONO 

344 (7.35%) 

415 (8.86%) 

12/B2 

JRNL 

952 (45.40%) 

CHAP 

679 (32.38%) 

VERD 

237(11.30%) 

MONO 

75 (3.58%) 

154 (7.34%) 

12/Cl 

JRNL 

1945 (39.10%) 

CHAP 

1788 (35.95%) 

MONO 

284 (5.71%) 

VERD 

280 (5.63%) 

677 (13.61%) 

12/C2 

JRNL 

432(42.11%) 

CHAP 

191 (18.62%) 

MONO 

93 (9.06%) 

PROC 

92 (8.97%) 

218(21.24%) 

12/DI 

JRNL 

1855 (42.38%) 

CHAP 

1398 (31.94%) 

VERD 

429 (9.80%) 

MONO 

229 (5.23%) 

466(10.65%) 

12/D2 

JRNL 

725 (47.54%) 

CHAP 

356 (23.34%) 

VERD 

317(20.79%) 

MONO 

80 (5.25%) 

47 (3.08%) 

12/El 

JRNL 

1229 (43.23%) 

CHAP 

870 (30.60%) 

MONO 

163 (5.73%) 

VERD 

109 (3.83%) 

472(16.61%) 

12/E2 

JRNL 

1697 (38.71%) 

CHAP 

1390 (31.71%) 

MONO 

250 (5.70%) 

VERD 

215 (4.90%) 

832(18.98%) 

12/E3 

CHAP 

1044 (38.42%) 

JRNL 

969 (35.66%) 

VERD 

310(11.41%) 

MONO 

166 (6.11%) 

228 (8.40%) 

12/FI 

JRNL 

412(39.85%) 

CHAP 

257 (24.85%) 

VERD 

195 (18.86%) 

MONO 

62 (6.00%) 

108 (10.44%) 

12/G1 

CHAP 

748 (40.06%) 

JRNL 

567 (30.37%) 

VERD 

195 (10.44%) 

MONO 

128 (6.86%) 

229(12.27%) 

12/G2 

CHAP 

1080 (40.00%) 

JRNL 

959 (35.52%) 

VERD 

238 (8.81%) 

MONO 

149 (5.52%) 

274(10.15%) 

12/HI 

JRNL 

365 (35.10%) 

CHAP 

272 (26.15%) 

MONO 

128(12.31%) 

PROC 

71 (6.83%) 

204(19.61%) 

12/H2 

CHAP 

334 (27.72%) 

JRNL 

259 (21.49%) 

REVJ 

146(12.12%) 

PROC 

137 (11.37%) 

329 (27.30%) 

12/H3 

JRNL 

1196 (37.32%) 

CHAP 

793 (24.74%) 

MONO 

266 (8.30%) 

REVJ 

212 (6.61%) 

738 (23.03%) 

13/A1 

JRNL 

4600 (64.85%) 

CHAP 

1260 (17.76%) 

OP 

637 (8.98%) 

MONO 

185 (2.61%) 

411 (5.80%) 

13/A2 

JRNL 

7127 (54.52%) 

CHAP 

3066 (23.45%) 

OP 

1242 (9.50%) 

PROC 

586 (4.48%) 

1052 (8.05%) 

13/A3 

JRNL 

1642 (58.94%) 

CHAP 

574 (20.60%) 

OP 

243 (8.72%) 

PROC 

114 (4.09%) 

213 (7.65%) 

13/A4 

JRNL 

3119(47.39%) 

CHAP 

1688 (25.65%) 

PROC 

607 (9.22%) 

OP 

591 (8.98%) 

576 (8.76%) 

13/A5 

JRNL 

759 (69.57%) 

CHAP 

161 (14.76%) 

PROC 

69 (6.32%) 

OP 

69 (6.32%) 

33 (3.03%) 

13/BI 

JRNL 

2668 (35.50%) 

CHAP 

2526 (33.61%) 

PROC 

1015 (13.50%) 

MONO 

725 (9.65%) 

582 (7.74%) 

13/B2 

JRNL 

2149 (31.27%) 

CHAP 

2061 (29.99%) 

PROC 

1615 (23.50%) 

MONO 

437 (6.36%) 

610(8.88%) 

13/B3 

CHAP 

963 (34.83%) 

PROC 

738 (26.69%) 

JRNL 

708 (25.61%) 

MONO 

148 (5.35%) 

208 (7.52%) 

13/B4 

JRNL 

1853 (40.10%) 

CHAP 

1515(32.79%) 

PROC 

418(9.05%) 

OP 

332 (7.18%) 

503 (10.88%) 

13/B5 

PROC 

971 (35.43%) 

JRNL 

915 (33.38%) 

CHAP 

414(15.10%) 

ABSP 

193 (7.04%) 

248 (9.05%) 

13/Cl 

CHAP 

1958 (35.85%) 

JRNL 

1742 (31.90%) 

MONO 

472 (8.64%) 

REVJ 

344 (6.30%) 

945(17.31%) 

13/DI 

JRNL 

2346 (47.15%) 

PROC 

1365 (27.43%) 

CHAP 

616(12.38%) 

OP 

244 (4.90%) 

405 (8.14%) 

13/D2 

JRNL 

1058 (46.40%) 

CHAP 

518(22.72%) 

PROC 

407 (17.85%) 

OP 

162 (7.11%) 

135 (5.92%) 

13/D3 

CHAP 

917 (34.63%) 

JRNL 

824(31.12%) 

PROC 

412(15.56%) 

OP 

152 (5.74%) 

343 (12.95%) 

13/D4 

JRNL 

1705 (61.80%) 

CHAP 

330(11.96%) 

PROC 

314(11.38%) 

ABSP 

159 (5.76%) 

251 (9.10%) 

14/A1 

JRNL 

1808 (34.04%) 

CHAP 

1332 (25.08%) 

MONO 

459 (8.64%) 

REVJ 

434 (8.17%) 

1279 (24.07%) 

14/A2 

JRNL 

991 (40.32%) 

CHAP 

790 (32.14%) 

MONO 

193 (7.85%) 

REVJ 

161 (6.55%) 

323 (13.14%) 

14/BI 

JRNL 

1210 (27.60%) 

CHAP 

1135 (25.89%) 

REVJ 

562 (12.82%) 

MONO 

348 (7.94%) 

1129 (25.75%) 

14/B2 

JRNL 

1452 (30.68%) 

CHAP 

1365 (28.84%) 

REVJ 

544(11.49%) 

MONO 

423 (8.94%) 

949 (20.05%) 

14/C2 

CHAP 

1029 (39.81%) 

JRNL 

774 (29.94%) 

MONO 

284 (10.99%) 

CUR 

199 (7.70%) 

299(11.56%) 

14/DI 

CHAP 

1753 (39.56%) 

JRNL 

1388 (31.32%) 

MONO 

385 (8.69%) 

CUR 

257 (5.80%) 

648 (14.63%) 


Appendix D. Levenshtein distance 


The Levenshtein distance between two strings (sequences of characters) is the number of edit operations that 
are required to transform one string into the other. The following single-character edit operations are permitted: 
(0 deletion of a character; (ii) insertion of a character; (Hi) replacement of a character with a different one. 

Let S[l..n] and T[l..m] be two strings of length n |S| and m : = |T|, respectively. The Levenshtein distance 
L(S,T) of S and T is the value of the auxiliary function Lsj(n,m), where Lsj(i,j) is defined for all 0 < i < n, 
0 < j < m as follows; 


Lsj(i, j) 


max{i, j) if i = 0 or j = 0 

m\n{ L SJ (i - 1 ,j) + 1 ,L SJ (i, j - 1) + 1 ,L SJ (i -l,j-l) + lsm^ry]} otherwise 
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(D.l) 



where ly> if the indicator function, whose value is 1 if the predicate V is true, 0 otherwise. Lsj(i, j) is the minimum 
number of edit operations needed to transform the prefix S[l..i] of S into the prefix T[l..j] of T. If one of the prefixes 
is empty (i = 0 or j = 0), then the distance is simply the length of the nonempty prefix. If both prefixes are nonempty, 
S [l..i] can be transformed into T[ \ .. /] by either: 


1. deleting the character S [/] and transforming S [l..i— 1] into T[\ this requires Lsj(i— 1, j) + 1 edit operations; 

2. deleting T[j] and transforming S [ 1into T[l..j - 1]; this requires Lsj(i,j - 1) + 1 edit operations; 

3. replacing S [i] with T[j] (if they are different, otherwise do nothing) and transforming S [1 ,.i— 1] into T[\..j— 1]; 
this require L$j(i - 1, j - 1) + ls[i]=r[j] edit operations. 


The value Lsj(n,m ) can be computed in time 0(nm ) by tabulating all values Lsj (i, j) starting from L$j( 0,0). 
The Levenshtein distance is zero if and only if S and T are equal; the maximum value is max{|.S' |, 7)) when S and T 
contain distinct sets of characters (e.g., S = “abcdef”, T = “ghijklmnopqrst”). The normalized Levenshtein distance 
L n (S, T) is defined as: 


L„(S,T ) 


LsA\S\,\T\) 

max{|S|,|7’|} 


(D.2) 


and assumes values in the range [0,1]. 

By definition, a small difference between short documents results in a larger normalized distance than the same 
difference between long documents. Formally, given two pairs of documents S, T and S', T' where |.S’ < |5'|, 7' < 
\T'\ and such that 1, |T|) = Lj' > r-(|5' , |, |7’ , [), then according to Equation ( |D.2| ) we have L^(S, T ) > L^iS', T'). In 

short, the same (absolute) difference matters more for short documents than for long ones. 

It is important to remark that the normalized Levenshtein distance among real-world documents is usually much 
lower than 1.0. For example, the normalized distance between a portion of the United States Declaration of Inde¬ 
pendence and a portion of equal length from the Divine Comedy by Italian poet Dante Alighieri is less than 0.8; the 
distance between two random character sequences is about 0.9. 
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